ICYMI, you can fine-tune open LLMs using Claude Code
just tell it: โFine-tune Qwen3-0.6B on open-r1/codeforces-cotsโ
and Claude submits a real training job on HF GPUs using TRL.
it handles everything: > dataset validation > GPU selection > training + Trackio monitoring > job submission + cost estimation when itโs done, your model is on the Hub, ready to use
It comes packed with updates: > Agent training with tools in GRPO > New CISPO & SAPO losses + reasoning rewards > vLLM quantization in colocate mode > Dataset shuffling in SFT > Lots of NEW examples > Tons of fixes and documentation improvements
The LLM by @karpathy is officially in the library, and we wrote a blog covering: how did we port the model, differences from the original, and how to run or train it.
fine-tuning a 14B model with TRL + SFT on a free Colab (T4 GPU)? thanks to the latest TRL optimizations, you actually can! sharing a new notebook showing how to do it ๐
Gave a smol ๐ค intro to Agents using smolagents last Monday! Sharing the slides in case you're curious. They serve as a gentle first step into the Agents Course we developed at @huggingface ๐ซถ๐ซถ
Sharing the slides from yesterday's talk about "Fine Tuning with TRL" from the @TogetherAgent x @huggingface workshop we hosted in our Paris office ๐!