Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.52.2
metadata
title: Advanced Multilingual Image Describer
emoji: π
colorFrom: purple
colorTo: indigo
sdk: streamlit
sdk_version: 1.32.0
app_file: app.py
pinned: false
π Advanced Multilingual Image Describer
No translation APIs β’ Native multilingual support β’ Latest vision-language models
π Features
- Direct multilingual captioning - No separate translation step
- Latest models - LLaVA 1.5, Qwen-VL, Moondream 2
- 10+ languages - Native support for English, Chinese, Amharic, Spanish, French, German, Arabic, and more
- Fast & efficient - Optimized for Hugging Face Spaces
- Clean interface - Simple and intuitive
π€ Supported Models
LLaVA 1.5 (7B)
- Languages: English, Chinese, Spanish, French, German, Italian, Russian, Japanese, Korean, Arabic
- Best for: High-quality detailed descriptions
- Size: 7 billion parameters
Qwen-VL-Chat
- Languages: English, Chinese, Japanese, Korean, French, German, Spanish, Russian
- Best for: Conversational responses
- Size: 9.6 billion parameters
Moondream 2
- Languages: English, Spanish, French, German
- Best for: Fast inference, smaller size
- Size: 1.4 billion parameters
π How It Works
- Select a model from the sidebar
- Choose language for output
- Upload an image (JPG, PNG, WebP)
- Click "Generate Description"
- Get native description in selected language
β‘ Performance
- Inference time: 2-10 seconds
- Memory usage: ~8-16GB RAM
- Quality: Human-like descriptions
- Languages: Native output (not translated)
π οΈ Technical Details
- Framework: Streamlit + Transformers
- Models: Latest vision-language models from Hugging Face
- Deployment: Hugging Face Spaces (CPU/GPU)
- Code: Pure Python, no external APIs