InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning Paper • 2601.14209 • Published 5 days ago • 5
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 9 days ago • 29
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published 12 days ago • 38
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning Paper • 2601.07641 • Published 13 days ago • 45
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 11 days ago • 82
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 12 days ago • 140
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published 15 days ago • 50
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning Paper • 2601.09088 • Published 11 days ago • 58
OpenTinker: Separating Concerns in Agentic Reinforcement Learning Paper • 2601.07376 • Published 13 days ago • 6
Dr. Zero: Self-Evolving Search Agents without Training Data Paper • 2601.07055 • Published 13 days ago • 20
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts Paper • 2601.05110 • Published 17 days ago • 29
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era Paper • 2601.07526 • Published 13 days ago • 21
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published 16 days ago • 79
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 16 days ago • 205
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 20 days ago • 102
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 20 days ago • 60
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation Paper • 2601.02256 • Published 20 days ago • 33