Llara

Introduction

Llara1.1 is a 124M parameter (33M params more than llara1.0) autoregressive language model trained from scratch on English web text. It follows the GPT-2 Small architecture and is trained entirely from random initialisation — no pretrained weights, no distillation, no fine-tuning of an existing model. but it does use GPT's tokenizer (sorta)

The name Llara is original and unrelated to LLaMA or LoRA.

Note: The model is stil undertrained according to The Chinchilla Laws (2022)


Improvements

  • Incressed context length to 512 tokens
  • Better and clearner training data
  • Able to form cohirent sentences even at 20 max tokens
  • Better GPT config

Model Details

Property Value
Architecture GPT-2 (decoder-only transformer)
Parameters ~124.0M
Context length 512 tokens
Embedding dim -
Layers 12
Attention heads 12
Vocabulary 50,257 (GPT-2 BPE)
Training data FineWeb (HuggingFaceFW/fineweb) + Custom dataset
Training docs 131M tokens
Epochs 1.1
Precision fp16

Usage

from transformers import GPT2LMHeadModel, AutoTokenizer, pipeline

model = GPT2LMHeadModel.from_pretrained("helloadhavan/llara1.1-100M-base")
tokenizer = AutoTokenizer.from_pretrained("helloadhavan/llara1.1-100M-base")

gen = pipeline("text-generation", model=model, tokenizer=tokenizer)

output = gen(
    "Once upon a time",
    max_new_tokens=20,
    do_sample=True,
    temperature=0.8,
    top_p=0.95,
    repetition_penalty=1.1,
)

print(output[0]["generated_text"])

Limitations

  • Llara is trained on English web text only and performs poorly on other languages.
  • Like all autoregressive LMs trained on web data, it may reproduce biases, factual errors, or inappropriate content present in the training corpus.
  • It is a research model trained from scratch and is not instruction-tuned or aligned — it should not be used in production or user-facing applications without further fine-tuning and safety work.
  • At 124M parameters and 2M training documents, it is significantly smaller and less trained than models like GPT-2 (which saw 40GB of text). Outputs may be incoherent on complex prompts.

Intended Use

Llara is intended for:

  • Research and experimentation with small language models
  • Learning how GPT-style models are trained from scratch
  • A base for fine-tuning on downstream tasks

Training Framework

Trained using Hugging Face Transformers Trainer on a single GPU.


License

Apache 2.0

Note: i am a AI hobbyist, not an AI engineer
Downloads last month
27
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including helloadhavan/llara1.1-100M-base