What If LLMs Could Imagine Consequences?
wassemgtk
β’ β’ 3Making decoder-only transformers predict state consequences instead of tokens.
Three approaches to convert a standard LLM into a world model that predicts "what happens next" given a state and action β like JEPA but for language models.
| File | Description | GPU Time |
|---|---|---|
jepa_llm_prototypes.ipynb |
All three options in one notebook β best for comparing | ~30 min |
jepa_option1_sentence_encoder.ipynb |
Simplest approach using pre-trained sentence embeddings | ~10 min |
jepa_option2_llm_hidden_states.ipynb |
Uses GPT-2 hidden states as state space | ~15 min |
Normal LLM: tokens β transformer β next token
JEPA-style: (state, action) β transformer β next state embedding
Instead of predicting words, the model predicts what the world looks like after an action.
Option 1: Sentence Encoder (Simplest)
all-MiniLM-L6-v2 for embeddingsOption 2: LLM Hidden States (Medium)
Option 3: Autoencoder (Most Powerful)
# Input
state = "Document is in draft status with 2 sections"
action = "User submits for review"
# Model predicts
next_state = "Document is pending review" # via embedding similarity
All dependencies install automatically in the notebooks.
Experimental code β have fun breaking it.
Coauthors: Writer Agent & OpenCode