view article Article Phare LLM benchmark V2: Reasoning models don't guarantee better security 25 days ago • 10
view article Article Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs May 7, 2025 • 42
view article Article RealPerformance, A Dataset of Language Model Business Compliance Issues Jul 21, 2025 • 4
view article Article LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs Jul 2, 2025 • 16
RealHarm: A Collection of Real-World Language Model Application Failures Paper • 2504.10277 • Published Apr 14, 2025 • 10
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 256
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22, 2024 • 126