GPT-2 30M β€” TinyStories

A 30M parameter GPT-2 model trained from scratch on the TinyStoriesV2 (cleaned) dataset. Built as a learning project to understand PyTorch and transformer architectures deeply.

Model Details

Parameter Value
Parameters ~49.5M (incl. embeddings)
Vocabulary 50,257 (GPT-2 tiktoken)
Context Length 512
Embedding Dim 384
Attention Heads 6
Transformer Layers 6
Dropout 0.1
Activation GELU

Architecture: Token + positional embeddings β†’ Dropout β†’ 6x Transformer blocks (pre-norm, residual connections) β†’ LayerNorm β†’ Linear output

Training

Metric Value
Dataset TinyStoriesV2 (cleaned)
Epochs 6
Batch Size 64
Learning Rate 5e-4
Final Train Loss 1.346
Final Val Loss 1.272
Final Perplexity 3.57
Training Time ~50 minutes
Hardware NVIDIA H100 80GB

Loss Curve

Epoch Train Loss Val Loss Perplexity
1 2.140 1.547 4.70
2 1.541 1.406 4.08
3 1.446 1.349 3.85
4 1.399 1.313 3.72
5 1.367 1.288 3.62
6 1.346 1.272 3.57

Usage

This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.

Setup

# Clone the repository with the model code
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync

See the GitHub repository for usage examples and the full API reference.

Limitations

  • Trained only on TinyStories β€” generates simple children's stories, not general text
  • No instruction tuning β€” does not follow prompts or answer questions
  • Small model β€” limited coherence over long sequences
  • English only

Source Code

Full implementation: github.com/aryandeore/monday_morning_moral

Downloads last month
37
Inference Examples
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for 0rn0/gpt2-30m-tinystories

Finetunes
1 model

Dataset used to train 0rn0/gpt2-30m-tinystories

Collection including 0rn0/gpt2-30m-tinystories