MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 11 days ago • 180
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Paper • 2601.21459 • Published Jan 29 • 10
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper • 2504.10127 • Published Apr 14, 2025 • 17
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 42
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads Paper • 2602.09443 • Published Feb 10 • 59
Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability Paper • 2602.02477 • Published Feb 2 • 11
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 42
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 42
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models Paper • 2511.23319 • Published Nov 28, 2025 • 24
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134