GPT-2 30M — TinyStories

A 30M parameter GPT-2 model trained from scratch on the TinyStoriesV2 (cleaned) dataset. Built as a learning project to understand PyTorch and transformer architectures deeply.

Model Details

Parameter	Value
Parameters	~49.5M (incl. embeddings)
Vocabulary	50,257 (GPT-2 tiktoken)
Context Length	512
Embedding Dim	384
Attention Heads	6
Transformer Layers	6
Dropout	0.1
Activation	GELU

Architecture: Token + positional embeddings → Dropout → 6x Transformer blocks (pre-norm, residual connections) → LayerNorm → Linear output

Training

Metric	Value
Dataset	TinyStoriesV2 (cleaned)
Epochs	6
Batch Size	64
Learning Rate	5e-4
Final Train Loss	1.346
Final Val Loss	1.272
Final Perplexity	3.57
Training Time	~50 minutes
Hardware	NVIDIA H100 80GB

Loss Curve

Epoch	Train Loss	Val Loss	Perplexity
1	2.140	1.547	4.70
2	1.541	1.406	4.08
3	1.446	1.349	3.85
4	1.399	1.313	3.72
5	1.367	1.288	3.62
6	1.346	1.272	3.57

Usage

This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.

Setup

# Clone the repository with the model code
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync

See the GitHub repository for usage examples and the full API reference.

Limitations

Trained only on TinyStories — generates simple children's stories, not general text
No instruction tuning — does not follow prompts or answer questions
Small model — limited coherence over long sequences
English only

Source Code

Full implementation: github.com/aryandeore/monday_morning_moral

Downloads last month: 37

Model tree for 0rn0/gpt2-30m-tinystories

Finetunes

1 model

Dataset used to train 0rn0/gpt2-30m-tinystories

Collection including 0rn0/gpt2-30m-tinystories

Tiny Stories

Collection

30M and 125M GPT-2 models pre-trained and instruction fine tuned on TinyStories dataset. • 6 items • Updated about 5 hours ago • 1