Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
Paper • 2509.23250 • Published • 6
Natural Language Processing
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards