Masked Diffusion Language Models with Frequency-Informed Training
Paper
•
2509.05056
•
Published
🎤 Oral Presentation at BabyLM Workshop @ EMNLP 2025
This model is a Masked Diffusion Language Model (MDLM) trained with a Bimodal Gaussian noise schedule and frequency-informed masking for the BabyLM Challenge 2025.
This model uses a diffusion-based training objective that combines:
Performance on BabyLM Challenge zero-shot tasks:
| Task | Score |
|---|---|
| BLiMP | 78.2 |
| BLiMP Supplement | 73.6 |
| EWoK | 52.5 |
| COMPS | 56.6 |
| Entity Tracking | 39.7 |
from transformers import AutoTokenizer
import torch
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("despoinakk/diffusion_gaussian_babylm")
# Load model (custom modeling code required)
# See: https://github.com/DespoinaKK/babylm-diffusion
If you use this model, please cite:
TBA
Based on work from: