Perry the Platypus's picture

Perry the Platypus PRO

AgPerry

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

upvoted a paper 6 days ago

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

upvoted a paper 6 days ago

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

View all activity

Organizations

upvoted a paper 1 day ago

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Paper • 2606.14885 • Published 7 days ago • 8

upvoted 2 papers 6 days ago

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

Paper • 2605.26340 • Published 25 days ago • 36

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

Paper • 2606.06113 • Published 15 days ago • 15

updated a dataset 8 days ago

TIGER-Lab/ClawBench

Viewer • Updated 8 days ago • 283 • 617

upvoted a paper 15 days ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Paper • 2605.30288 • Published 21 days ago • 23

updated a Space 25 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated 4 datasets 25 days ago

TIGER-Lab/ClawBenchV2Trace

Updated 25 days ago • 9.92k

NAIL-Group/ClawBenchV2Trace

Updated 25 days ago • 3.82k

NAIL-Group/ClawBenchV1Trace

Updated 25 days ago • 7.05k

NAIL-Group/ClawBench

Viewer • Updated 25 days ago • 153 • 241 • 2

commented a paper about 1 month ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10 •

upvoted a paper about 1 month ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

New activity in huggingface/HuggingDiscussions about 1 month ago

[FEEDBACK] Daily Papers

#32 opened about 2 years ago by

submitted a paper to Daily Papers about 1 month ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12

published a Space about 1 month ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12

updated a Space about 1 month ago

ClawBench Leaderboard

Live leaderboard for the ClawBench web-agent benchmark

updated 2 collections about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12

ClawBench — Browser Agent Benchmark Suite

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12 • 1