📊 MMEB LEADERBOARD (VLM2Vec)

Introduction

We introduce a novel benchmark, MMEB-V1 (Massive Multimodal Embedding Benchmark), which includes 36 datasets spanning four meta-task categories: classification, visual question answering, retrieval, and visual grounding. MMEB provides a comprehensive framework for training and evaluating embedding models across various combinations of text and image modalities. All tasks are reformulated as ranking tasks, where the model follows instructions, processes a query, and selects the correct target from a set of candidates. The query and target can be an image, text, or a combination of both. MMEB-V1 is divided into 20 in-distribution datasets, which can be used for training, and 16 out-of-distribution datasets, reserved for evaluation.

Building upon on MMEB-V1, MMEB-V2 expands the evaluation scope to include five new tasks: four video-based tasks — Video Retrieval, Moment Retrieval, Video Classification, and Video Question Answering — and one task focused on visual documents, Visual Document Retrieval. This comprehensive suite enables robust evaluation of multimodal embedding models across static, temporal, and structured visual data settings.

⚠️ Your attention please:==================================================================================
We have fixed the errors found in ViDoSeek-page and MMLongBench-page datasets. Detailed information about the issues can be viewed HERE.
‼️ CALL TO ACTION: Please verify that your model's scores are accurate by following the instructions below. ‼️

The scores submitted before the fix have been renamed to ViDoSeek-page-before-fix and MMLongBench-page-before-fix, and the current Overall and Visdoc-Overall are using the fixed versions of these two datasets (eg. ViDoSeek-page-fixed and MMLongBench-page-fixed), which means the models submitted before the fix might now have a lower ranking.

Here is the list of models affected by this fix: File. If your model is on this list, we kindly ask you to visit the VisDoc leaderboard and verify if your model is missing the scores for ViDoSeek-page-fixed and MMLongBench-page-fixed, and then re-evaluate your model on these two datasets and submit the updated scores to ensure accurate ranking on the leaderboard.

In the generated report sheet, please double check that ViDoSeek-page and MMLongBench-page have been renamed to ViDoSeek-page-fixed and MMLongBench-page-fixed respectively.

Please let us know if you have any questions or concerns, or there is an issue with your model's scores. We appreciate your understanding and cooperation in maintaining the integrity of the benchmark. ===========================================================================================================

🔥 What's NEW:
  • [2026-01] ⚠️ The issues found in ViDoSeek-page and MMLongBench-page datasets have been fixed.
  • [2025-11] The leaderboards' rankings can be directly downloaded in csv/json format. Scroll down to the bottom of this page and click the button to download.
  • [2025-06] MMEB-V2 released!

| 📈Overview | Github | 📖MMEB-V2/VLM2Vec-V2 Paper | 📖MMEB-V1/VLM2Vec-V1 Paper | 🤗Hugging Face | Discord |

MMEB: Massive MultiModal Embedding Benchmark

Models are ranked based on Overall

0 8.77
0 8.77
Rank
Models
Model Size(B)
Date
Overall
Image-Overall
Video-Overall
Visdoc-Overall
10
unknown
2026-01-06
77.82
80.12
67.15
82.36