AssistantBench

AssistantBench aims to evaluate the ability of web agents to assist with real and time-consuming tasks. For more information, please check out our paper or the official website. To download AssistantBench, press here.

AssistantBench Leaderboard

Model Name
Accuracy
Answer rate
Precision
EM
Accuracy (easy)
Accuracy (medium)
Accuracy (hard)
Base Model
Organization

28.30

94.5
28.8
10.5
67.8
48.5
15.5
gpt-4o, o1-preview
MSR AI Frontiers

Making a New Submission

To make a new submission, upload a predictions file. Our scoring function can be found here. We support JSONL files with the following format:

{"id": "task_id_1", "answer": "Answer 1 from your model"}
{"id": "task_id_2", "answer": "Answer 2 from your model"}

We would like to thank the GAIA team for sharing the source code for their leaderboard which we used as a template and HuggingFace for hosting the leaderboard.