VLMs - a hg2wzh Collection

hg2wzh 's Collections

Embed

VLMs

LLMs

VLMs

updated Apr 25

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 48
AIDC-AI/Ovis2-2B

Image-Text-to-Text • 2B • Updated Aug 15 • 3.37k • 59
DAMO-NLP-SG/VideoLLaMA3-2B

Video-Text-to-Text • 2B • Updated Sep 3 • 4.15k • 15
AIDC-AI/Ovis2-16B

Image-Text-to-Text • 16B • Updated Aug 15 • 10.6k • 101
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated May 1 • 388k • 1.55k
StarJiaxing/R1-Omni-0.5B

1B • Updated Mar 24 • 53 • 82
Skywork/Skywork-R1V2-38B

Image-Text-to-Text • 38B • Updated Jun 10 • 115 • 126