ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published Jun 23, 2025 • 29
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning Paper • 2506.03136 • Published Jun 3, 2025 • 25
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper • 2504.13914 • Published Apr 10, 2025 • 4
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24, 2025 • 22
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion Paper • 2502.04235 • Published Feb 6, 2025 • 23