🔄 In a Training Loop

Stefano Fiorucci PRO

anakin87

·

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

liked a Space 7 days ago

mii-llm/Post-Training-Challenge

updated a dataset 8 days ago

anakin87/doom-defend-the-center-100k-oracle-nofwd

published a dataset 8 days ago

anakin87/doom-defend-the-center-100k-oracle-nofwd

View all activity

Organizations

Posts 29

Post

3428

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

Articles 4

Article

32

Exploring Environments Hub: Your Language Model needs better (open) environments to learn

View all Articles

Collections 5

View 5 collections

spaces 8

Emma 5

Demo non ufficiale di un piccolo modello italiano

Phi 3.5 Mini ITA

Chat with an Italian Small Model

Mr. Tic Tac Toe

Play Tic Tac Toe against a small RL tuned model

Gemma 3 270m IT

Chat with Gemma 3 270m IT

Fact Checking rocks!

Fact checking baseline. Dense retrieval + textual entailment

Gemma 2 2B Neogenesis ITA

Chat with an Italian Small Model

models 22

anakin87/emma-5

Text Generation • Updated 21 days ago • 31

anakin87/LFM2-2.6B-mr-tictactoe

Text Generation • 3B • Updated Apr 5 • 9 • 1

anakin87/LFM2-2.6B-ttt-rl-2

Text Generation • Updated Apr 5 • 2

anakin87/LFM2-2.6B-ttt-rl-merged

Text Generation • 3B • Updated Apr 5 • 4

anakin87/LFM2-2.6B-ttt-rl

Text Generation • Updated Apr 5 • 3

anakin87/LFM2-2.6B-ttt-sft

Text Generation • 3B • Updated Apr 5 • 8

anakin87/Phi-3.5-mini-ITA

Text Generation • 4B • Updated Mar 24 • 5.02k • 13

anakin87/Qwen3-0.6B-alphabet-sort-grpo

0.6B • Updated Sep 4, 2025

anakin87/gemma-2-2b-ita-sft

Text Generation • 3B • Updated Jun 29, 2025 • 3

anakin87/electra-italian-xxl-cased-squad-it

Question Answering • 0.1B • Updated Jun 29, 2025 • 20 • 8

datasets 13

anakin87/doom-defend-the-center-100k-oracle-nofwd

Viewer • Updated 8 days ago • 1 • 66 • 1

anakin87/doom-defend-the-center-100k-oracle

Viewer • Updated 10 days ago • 1 • 32

anakin87/tictactoe-filtered

Viewer • Updated Apr 5 • 174 • 22

anakin87/tictactoe

Viewer • Updated Apr 5 • 200 • 28

anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval

Viewer • Updated Sep 4, 2025 • 15 • 12

anakin87/Qwen3-0.6B-alphabet-sort-eval

Viewer • Updated Sep 4, 2025 • 15 • 22

anakin87/events-scheduling

Viewer • Updated Apr 26, 2025 • 600 • 72 • 3

anakin87/evol-dpo-ita-reranked

Viewer • Updated Jan 14, 2025 • 19.8k • 43 • 5

anakin87/gemma-vs-gemma-preferences

Viewer • Updated Jan 14, 2025 • 24.7k • 15

anakin87/fine-instructions-ita-70k

Viewer • Updated Jan 14, 2025 • 69.9k • 67 • 4

View 13 datasets