amogne-vlm-LLM / README.md
amogneandualem's picture
Update README.md
d18c9eb verified
---
title: Advanced Multilingual Image Describer
emoji: 🌍
colorFrom: purple
colorTo: indigo
sdk: streamlit
sdk_version: "1.32.0"
app_file: app.py
pinned: false
---
# 🌍 Advanced Multilingual Image Describer
**No translation APIs β€’ Native multilingual support β€’ Latest vision-language models**
## πŸš€ Features
- **Direct multilingual captioning** - No separate translation step
- **Latest models** - LLaVA 1.5, Qwen-VL, Moondream 2
- **10+ languages** - Native support for English, Chinese, Amharic, Spanish, French, German, Arabic, and more
- **Fast & efficient** - Optimized for Hugging Face Spaces
- **Clean interface** - Simple and intuitive
## πŸ€– Supported Models
### LLaVA 1.5 (7B)
- **Languages**: English, Chinese, Spanish, French, German, Italian, Russian, Japanese, Korean, Arabic
- **Best for**: High-quality detailed descriptions
- **Size**: 7 billion parameters
### Qwen-VL-Chat
- **Languages**: English, Chinese, Japanese, Korean, French, German, Spanish, Russian
- **Best for**: Conversational responses
- **Size**: 9.6 billion parameters
### Moondream 2
- **Languages**: English, Spanish, French, German
- **Best for**: Fast inference, smaller size
- **Size**: 1.4 billion parameters
## 🌐 How It Works
1. **Select a model** from the sidebar
2. **Choose language** for output
3. **Upload an image** (JPG, PNG, WebP)
4. **Click "Generate Description"**
5. **Get native description** in selected language
## ⚑ Performance
- **Inference time**: 2-10 seconds
- **Memory usage**: ~8-16GB RAM
- **Quality**: Human-like descriptions
- **Languages**: Native output (not translated)
## πŸ› οΈ Technical Details
- **Framework**: Streamlit + Transformers
- **Models**: Latest vision-language models from Hugging Face
- **Deployment**: Hugging Face Spaces (CPU/GPU)
- **Code**: Pure Python, no external APIs
## πŸ“‹ File Structure