Stefano Fiorucci's picture

In a Training Loop 🔄

Stefano Fiorucci PRO

anakin87

·

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

liked a Space 2 minutes ago

HuggingFaceTB/trl-distillation-trainer

repliedto their post 1 day ago

📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reacted to theirpost with 😎 1 day ago

🌀 Let LLMs wander - Engineering RL Environments Reinforcement Learning Environments are little worlds where models can act, get rewards, and learn. I've been exploring how to design them, figuring out what works and what doesn't. If you want to learn how to build them, I recorded a practical intro video. You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂 🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q --- 🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

View all activity

Organizations

anakin87 's datasets 11

anakin87/tictactoe-filtered

Viewer • Updated 7 days ago • 174 • 25

anakin87/tictactoe

Viewer • Updated 7 days ago • 200 • 23

anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval

Viewer • Updated Sep 4, 2025 • 15 • 7

anakin87/Qwen3-0.6B-alphabet-sort-eval

Viewer • Updated Sep 4, 2025 • 15 • 19

anakin87/events-scheduling

Viewer • Updated Apr 26, 2025 • 600 • 103 • 2

anakin87/evol-dpo-ita-reranked

Viewer • Updated Jan 14, 2025 • 19.8k • 17 • 5

anakin87/gemma-vs-gemma-preferences

Viewer • Updated Jan 14, 2025 • 24.7k • 8

anakin87/fine-instructions-ita-70k

Viewer • Updated Jan 14, 2025 • 69.9k • 20 • 4

anakin87/FineTome-single-turn-dedup

Viewer • Updated Jan 11, 2025 • 83.3k • 10

anakin87/tulu-3-sft-mixture-with-language

Viewer • Updated Dec 11, 2024 • 939k • 40

anakin87/medrag-pubmed-chunk

Viewer • Updated Feb 25, 2024 • 15.4k • 44