Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop ๐
50
76
187
Stefano Fiorucci
PRO
anakin87
Follow
akashsag's profile picture
qdrddr's profile picture
liuzaoqu's profile picture
173 followers
ยท
87 following
theanakin87
anakin87
stefano-fiorucci
AI & ML interests
Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework ๐๏ธ
Recent Activity
liked
a Space
2 minutes ago
HuggingFaceTB/trl-distillation-trainer
replied
to
their
post
1 day ago
๐ฃ I just published a free course on Reinforcement Learning Environments for Language Models! ๐ COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practiceโ And how do you build them effectivelyโ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn ๐น Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain ๐น How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts ๐น Common patterns: How to build single-turn, multi-turn, and tool-use environments ๐น Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master ๐ธ Build the game Environment ๐ธ Use it to generate synthetic data for SFT warm-up ๐ธ Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- ๐ค๐น๏ธ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe ๐ HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
reacted
to
their
post
with ๐
1 day ago
๐ Let LLMs wander - Engineering RL Environments Reinforcement Learning Environments are little worlds where models can act, get rewards, and learn. I've been exploring how to design them, figuring out what works and what doesn't. If you want to learn how to build them, I recorded a practical intro video. You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master ๐ ๐ฅ Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q --- ๐ฑ LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course ๐ค๐น๏ธ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe ๐ HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
View all activity
Organizations
anakin87
's datasets
11
Sort:ย Recently updated
anakin87/tictactoe-filtered
Viewer
โข
Updated
7 days ago
โข
174
โข
25
anakin87/tictactoe
Viewer
โข
Updated
7 days ago
โข
200
โข
23
anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval
Viewer
โข
Updated
Sep 4, 2025
โข
15
โข
7
anakin87/Qwen3-0.6B-alphabet-sort-eval
Viewer
โข
Updated
Sep 4, 2025
โข
15
โข
19
anakin87/events-scheduling
Viewer
โข
Updated
Apr 26, 2025
โข
600
โข
103
โข
2
anakin87/evol-dpo-ita-reranked
Viewer
โข
Updated
Jan 14, 2025
โข
19.8k
โข
17
โข
5
anakin87/gemma-vs-gemma-preferences
Viewer
โข
Updated
Jan 14, 2025
โข
24.7k
โข
8
anakin87/fine-instructions-ita-70k
Viewer
โข
Updated
Jan 14, 2025
โข
69.9k
โข
20
โข
4
anakin87/FineTome-single-turn-dedup
Viewer
โข
Updated
Jan 11, 2025
โข
83.3k
โข
10
anakin87/tulu-3-sft-mixture-with-language
Viewer
โข
Updated
Dec 11, 2024
โข
939k
โข
40
anakin87/medrag-pubmed-chunk
Viewer
โข
Updated
Feb 25, 2024
โข
15.4k
โข
44