Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published Sep 28 • 5 • 2
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published Sep 28 • 8 • 1
RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents Paper • 2506.00618 • Published May 31 • 1 • 2