nimafathi commited on
Commit
facac85
·
verified ·
1 Parent(s): 91be14b

Upload HDLM model with complete HF integration

Browse files
Files changed (2) hide show
  1. README.md +5 -138
  2. model.safetensors +1 -1
README.md CHANGED
@@ -1,142 +1,9 @@
1
  ---
2
- language:
3
- - en
4
  tags:
5
- - text-generation
6
- - diffusion
7
- - language-model
8
- license: mit
9
  ---
10
 
11
- # hdlm-group/hdlm-base-epsilon-0.0
12
-
13
- This is a epsilon_hybrid diffusion language model trained on text data.
14
-
15
- ## Model Details
16
-
17
- - **Model Type**: epsilon_hybrid
18
- - **Architecture**: Diffusion-based language model
19
- - **Training Method**: Epsilon-hybrid diffusion training
20
-
21
- ## Configuration
22
-
23
- ```yaml
24
- ngpus: 4
25
- type: aligned
26
- gradient_accumulation_steps: 2
27
- tokenizer:
28
- tokens: 50257
29
- model: gpt2
30
- training:
31
- batch_size: 128
32
- accum: ${gradient_accumulation_steps}
33
- n_iters: 1250000
34
- snapshot_freq: 10000
35
- log_freq: 500
36
- eval_freq: 10000
37
- snapshot_freq_for_preemption: 3000
38
- snapshot_sampling: true
39
- ema: 0.9999
40
- warmup_iter: -1
41
- loss_type: hybrid
42
- epsilon: 0.0
43
- lambda: 0.0
44
- data:
45
- train: openwebtext-train
46
- valid: wikitext103
47
- cache_dir: /home/toolkit/research-diffcodegen/data
48
- debug: false
49
- graph:
50
- type: absorb
51
- gamma: 1.0
52
- file: /home/toolkit/research-diffcodegen/data
53
- report_all: false
54
- expanded_sigma: true
55
- noise:
56
- type: loglinear
57
- sigma_min: 0.0001
58
- sigma_max: 2.0
59
- ar_diffusion: false
60
- expanded_sigma: ${graph.expanded_sigma}
61
- sampling:
62
- predictor: analytic
63
- steps_per_level: 1
64
- noise_removal: true
65
- strategy: direct
66
- strategy_param: 0.9
67
- annealing:
68
- type: none
69
- efficient: false
70
- width: 1024
71
- tau: 1024
72
- eval_tau: 1024
73
- steps_per_level: ${sampling.steps_per_level}
74
- sampling_method: sdlm
75
- diffusion_loss_weight: 1.0
76
- ce_loss_weight: 1.0
77
- sampling_eps: 0.0001
78
- attention:
79
- context_type: block_causal
80
- block_type: full
81
- match_inference: false
82
- eval:
83
- batch_size: 16
84
- perplexity: true
85
- perplexity_batch_size: 8
86
- optim:
87
- weight_decay: 0.1
88
- optimizer: AdamW
89
- lr: 0.0002
90
- beta1: 0.9
91
- beta2: 0.95
92
- eps: 1.0e-08
93
- warmup: 10000
94
- grad_clip: 1.0
95
- scheduler: cosine
96
- experiment:
97
- name: MDLM
98
- wandb_project: Hybrid-SDLM-ALIGNED
99
- model:
100
- name: HDLM
101
- type: ddit
102
- hidden_size: 768
103
- cond_dim: 128
104
- length: 1024
105
- n_blocks: 12
106
- n_heads: 12
107
- dropout: 0.1
108
- scale_by_sigma: false
109
- transformer_sigma_conditioning: false
110
- hybrid_sigma_embedding: false
111
- post_process_logits: false
112
- use_timestep_embedding: false
113
- model_type: epsilon_hybrid
114
-
115
- ```
116
-
117
- ## Usage
118
-
119
- ```python
120
- from our.hf_utils import smart_model_loader
121
-
122
- # Load the model
123
- model, config, device, accelerator, metaschedule = smart_model_loader(
124
- "hdlm-group/hdlm-base-epsilon-0.0",
125
- model_type="epsilon_hybrid"
126
- )
127
-
128
- # Use the model for text generation
129
- # (Add specific usage examples based on your model's capabilities)
130
- ```
131
-
132
- ## Training Details
133
-
134
- This model was trained using the research-diffcodegen framework.
135
-
136
- ## Citation
137
-
138
- If you use this model in your research, please cite the original paper and this implementation.
139
-
140
- ## License
141
-
142
- This model is released under the MIT License.
 
1
  ---
 
 
2
  tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
 
 
5
  ---
6
 
7
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
+ - Library: [More Information Needed]
9
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:66dc8dda33246c7abf785c17a5a966c09c40de0ed82cbe686a541b76ba843396
3
  size 648995496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e26f295d748bf81c34c3352bd5ba46d8b2ba18d743a016206384f4de87f8648a
3
  size 648995496