hdlm-group
/

hdlm-base-epsilon-0.0

@@ -1,142 +1,9 @@
 ---
-language:
-- en
 tags:
-- text-generation
-- diffusion
-- language-model
-license: mit
 ---
-# hdlm-group/hdlm-base-epsilon-0.0
-This is a epsilon_hybrid diffusion language model trained on text data.
-## Model Details
-- **Model Type**: epsilon_hybrid
-- **Architecture**: Diffusion-based language model
-- **Training Method**: Epsilon-hybrid diffusion training
-## Configuration
-```yaml
-ngpus: 4
-type: aligned
-gradient_accumulation_steps: 2
-tokenizer:
-  tokens: 50257
-  model: gpt2
-training:
-  batch_size: 128
-  accum: ${gradient_accumulation_steps}
-  n_iters: 1250000
-  snapshot_freq: 10000
-  log_freq: 500
-  eval_freq: 10000
-  snapshot_freq_for_preemption: 3000
-  snapshot_sampling: true
-  ema: 0.9999
-  warmup_iter: -1
-  loss_type: hybrid
-  epsilon: 0.0
-  lambda: 0.0
-data:
-  train: openwebtext-train
-  valid: wikitext103
-  cache_dir: /home/toolkit/research-diffcodegen/data
-  debug: false
-graph:
-  type: absorb
-  gamma: 1.0
-  file: /home/toolkit/research-diffcodegen/data
-  report_all: false
-  expanded_sigma: true
-noise:
-  type: loglinear
-  sigma_min: 0.0001
-  sigma_max: 2.0
-  ar_diffusion: false
-  expanded_sigma: ${graph.expanded_sigma}
-sampling:
-  predictor: analytic
-  steps_per_level: 1
-  noise_removal: true
-  strategy: direct
-  strategy_param: 0.9
-annealing:
-  type: none
-  efficient: false
-  width: 1024
-  tau: 1024
-  eval_tau: 1024
-  steps_per_level: ${sampling.steps_per_level}
-  sampling_method: sdlm
-  diffusion_loss_weight: 1.0
-  ce_loss_weight: 1.0
-  sampling_eps: 0.0001
-  attention:
-    context_type: block_causal
-    block_type: full
-  match_inference: false
-eval:
-  batch_size: 16
-  perplexity: true
-  perplexity_batch_size: 8
-optim:
-  weight_decay: 0.1
-  optimizer: AdamW
-  lr: 0.0002
-  beta1: 0.9
-  beta2: 0.95
-  eps: 1.0e-08
-  warmup: 10000
-  grad_clip: 1.0
-  scheduler: cosine
-experiment:
-  name: MDLM
-  wandb_project: Hybrid-SDLM-ALIGNED
-model:
-  name: HDLM
-  type: ddit
-  hidden_size: 768
-  cond_dim: 128
-  length: 1024
-  n_blocks: 12
-  n_heads: 12
-  dropout: 0.1
-  scale_by_sigma: false
-  transformer_sigma_conditioning: false
-  hybrid_sigma_embedding: false
-  post_process_logits: false
-  use_timestep_embedding: false
-model_type: epsilon_hybrid
-```
-## Usage
-```python
-from our.hf_utils import smart_model_loader
-# Load the model
-model, config, device, accelerator, metaschedule = smart_model_loader(
-    "hdlm-group/hdlm-base-epsilon-0.0",
-    model_type="epsilon_hybrid"
-)
-# Use the model for text generation
-# (Add specific usage examples based on your model's capabilities)
-```
-## Training Details
-This model was trained using the research-diffcodegen framework.
-## Citation
-If you use this model in your research, please cite the original paper and this implementation.
-## License
-This model is released under the MIT License.

 ---
 tags:
+- model_hub_mixin
+- pytorch_model_hub_mixin
 ---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Library: [More Information Needed]
+- Docs: [More Information Needed]

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:66dc8dda33246c7abf785c17a5a966c09c40de0ed82cbe686a541b76ba843396
 size 648995496

 version https://git-lfs.github.com/spec/v1
+oid sha256:e26f295d748bf81c34c3352bd5ba46d8b2ba18d743a016206384f4de87f8648a
 size 648995496