Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
hg2wzh 's Collections
Text-to-Image
Datasets
Reasoning
Embed
CLIP series
VLMs
LLMs

VLMs

updated Apr 25
Upvote
-

  • Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    Paper • 2409.12191 • Published Sep 18, 2024 • 78

  • Multimodal Latent Language Modeling with Next-Token Diffusion

    Paper • 2412.08635 • Published Dec 11, 2024 • 48

  • AIDC-AI/Ovis2-2B

    Image-Text-to-Text • 2B • Updated Aug 15 • 3.37k • 59

  • DAMO-NLP-SG/VideoLLaMA3-2B

    Video-Text-to-Text • 2B • Updated Sep 3 • 4.15k • 15

  • AIDC-AI/Ovis2-16B

    Image-Text-to-Text • 16B • Updated Aug 15 • 10.6k • 101

  • microsoft/Phi-4-multimodal-instruct

    Automatic Speech Recognition • 6B • Updated May 1 • 388k • 1.55k

  • StarJiaxing/R1-Omni-0.5B

    1B • Updated Mar 24 • 53 • 82

  • Skywork/Skywork-R1V2-38B

    Image-Text-to-Text • 38B • Updated Jun 10 • 115 • 126
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs