Human-LLM Hybrid Preferences Collection Resources for hybrid preferences research where we learn how to route preference instances for human vs. AI feedback • 4 items • Updated 1 day ago • 1
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 110
M-RewardBench (ACL 2025) Collection Evaluating Reward Models in Multilingual Settings • 2 items • Updated Feb 10
view reply Hey thanks! I'll open up a PR on this. But in the meantime here's the link: https://huggingface.co/spaces/filbench/filbench-leaderboard