On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper โข 2508.11408 โข Published Aug 15 โข 8 โข 6
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper โข 2508.05629 โข Published Aug 7 โข 180 โข 21
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning Paper โข 2505.14362 โข Published May 20 โข 3 โข 2
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper โข 2506.01939 โข Published Jun 2 โข 187 โข 6