AssistantBench aims to evaluate the ability of web agents to assist with real and time-consuming tasks.
For more information, please check out our paper or the official website.
To download AssistantBench, press here.
To make a new submission, upload a predictions file. Our scoring function can be found here. We support JSONL files with the following format:
{"id": "task_id_1", "answer": "Answer 1 from your model"}
{"id": "task_id_2", "answer": "Answer 2 from your model"}
We would like to thank the GAIA team for sharing the source code for their leaderboard which we used as a template and HuggingFace for hosting the leaderboard.