Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published Sep 28 • 5
Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published Sep 29 • 4
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published Oct 9 • 22
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18 • 53
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs Paper • 2407.10058 • Published Jul 14, 2024 • 31