|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
base_model: |
|
|
- mistralai/Mistral-7B-Instruct-v0.3 |
|
|
tags: |
|
|
- video |
|
|
- audio |
|
|
- multimodal |
|
|
--- |
|
|
|
|
|
# [Vidi: Large Multimodal Models for Video Understanding and Editing](https://arxiv.org/pdf/2504.15681) |
|
|
|
|
|
Homepage: [https://bytedance.github.io/vidi-website/](https://bytedance.github.io/vidi-website/) |
|
|
|
|
|
Github: [https://github.com/bytedance/vidi](https://github.com/bytedance/vidi) |
|
|
|
|
|
Demo: [https://vidi.byteintl.com/](https://vidi.byteintl.com/) |
|
|
|
|
|
> We introduce Vidi, a family of Large Multimodal Models (LMMs) for a wide range of video understanding and editing (VUE) scenarios. The first release focuses on temporal retrieval (TR), i.e., identifying the time ranges in input videos corresponding to a given text query. |
|
|
|
|
|
This model is the first release for temporal retrieval. |
|
|
|
|
|
Please find the inference and evaluation code on [https://github.com/bytedance/vidi](https://github.com/bytedance/vidi). |
|
|
|
|
|
## Citation |
|
|
If you find Vidi useful for your research and applications, please cite using this BibTeX: |
|
|
``` |
|
|
@article{Vidi2025vidi2, |
|
|
title={Vidi2: Large Multimodal Models for Video |
|
|
Understanding and Creation}, |
|
|
author={Vidi Team, Celong Liu, Chia-Wen Kuo, Chuang Huang, |
|
|
Dawei Du, Fan Chen, Guang Chen, Haoji Zhang, |
|
|
Haojun Zhao, Lingxi Zhang, Lu Guo, Lusha Li, |
|
|
Longyin Wen, Qihang Fan, Qingyu Chen, Rachel Deng, |
|
|
Sijie Zhu, Stuart Siew, Tong Jin, Weiyan Tao, |
|
|
Wen Zhong, Xiaohui Shen, Xin Gu, Zhenfang Chen, Zuhua Lin}, |
|
|
journal={arXiv preprint arXiv:2511.19529}, |
|
|
year={2025} |
|
|
} |
|
|
|
|
|
@article{Vidi2025vidi, |
|
|
title={Vidi: Large Multimodal Models for Video |
|
|
Understanding and Editing}, |
|
|
author={Vidi Team, Celong Liu, Chia-Wen Kuo, Dawei Du, |
|
|
Fan Chen, Guang Chen, Jiamin Yuan, Lingxi Zhang, |
|
|
Lu Guo, Lusha Li, Longyin Wen, Qingyu Chen, |
|
|
Rachel Deng, Sijie Zhu, Stuart Siew, Tong Jin, |
|
|
Wei Lu, Wen Zhong, Xiaohui Shen, Xin Gu, Xing Mei, |
|
|
Xueqiong Qu, Zhenfang Chen}, |
|
|
journal={arXiv preprint arXiv:2504.15681}, |
|
|
year={2025} |
|
|
} |
|
|
``` |