view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11 • 94
Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation Paper • 2407.12223 • Published Jul 17, 2024 • 2