amogne-vlm-LLM / README.md
amogneandualem's picture
Update README.md
d18c9eb verified

A newer version of the Streamlit SDK is available: 1.52.2

Upgrade
metadata
title: Advanced Multilingual Image Describer
emoji: 🌍
colorFrom: purple
colorTo: indigo
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false

🌍 Advanced Multilingual Image Describer

No translation APIs β€’ Native multilingual support β€’ Latest vision-language models

πŸš€ Features

  • Direct multilingual captioning - No separate translation step
  • Latest models - LLaVA 1.5, Qwen-VL, Moondream 2
  • 10+ languages - Native support for English, Chinese, Amharic, Spanish, French, German, Arabic, and more
  • Fast & efficient - Optimized for Hugging Face Spaces
  • Clean interface - Simple and intuitive

πŸ€– Supported Models

LLaVA 1.5 (7B)

  • Languages: English, Chinese, Spanish, French, German, Italian, Russian, Japanese, Korean, Arabic
  • Best for: High-quality detailed descriptions
  • Size: 7 billion parameters

Qwen-VL-Chat

  • Languages: English, Chinese, Japanese, Korean, French, German, Spanish, Russian
  • Best for: Conversational responses
  • Size: 9.6 billion parameters

Moondream 2

  • Languages: English, Spanish, French, German
  • Best for: Fast inference, smaller size
  • Size: 1.4 billion parameters

🌐 How It Works

  1. Select a model from the sidebar
  2. Choose language for output
  3. Upload an image (JPG, PNG, WebP)
  4. Click "Generate Description"
  5. Get native description in selected language

⚑ Performance

  • Inference time: 2-10 seconds
  • Memory usage: ~8-16GB RAM
  • Quality: Human-like descriptions
  • Languages: Native output (not translated)

πŸ› οΈ Technical Details

  • Framework: Streamlit + Transformers
  • Models: Latest vision-language models from Hugging Face
  • Deployment: Hugging Face Spaces (CPU/GPU)
  • Code: Pure Python, no external APIs

πŸ“‹ File Structure