Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion Paper • 2606.14885 • Published 7 days ago • 8
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Paper • 2605.26340 • Published 25 days ago • 36
Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback Paper • 2606.06113 • Published 15 days ago • 15
MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection Paper • 2605.30288 • Published 21 days ago • 23
ClawBench Collection Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12
ClawBench Collection Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12
ClawBench Collection Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12
ClawBench — Browser Agent Benchmark Suite Collection Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12 • 1