I Built a RAG System That Listens to Live BBC News and Answers Questions About "What Happened 10 Minutes Ago" 8 days ago • 13
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7 • 259
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11 • 92
We're open-sourcing "The Amazing Hand", a fully 3D printed robotic hand for less than $200 ✌️✌️✌️ Jul 8 • 45
I Built a RAG System That Listens to Live BBC News and Answers Questions About "What Happened 10 Minutes Ago" 8 days ago • 13
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7 • 259
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11 • 92
We're open-sourcing "The Amazing Hand", a fully 3D printed robotic hand for less than $200 ✌️✌️✌️ Jul 8 • 45