Spaces:

amogneandualem
/

amogne-vlm-LLM

Sleeping

App Files Files Community

amogne-vlm-LLM / README.md

amogneandualem

Update README.md

d18c9eb verified 24 days ago

preview code

raw

history blame contribute delete

1.86 kB

	---
	title: Advanced Multilingual Image Describer
	emoji: 🌍
	colorFrom: purple
	colorTo: indigo
	sdk: streamlit
	sdk_version: "1.32.0"
	app_file: app.py
	pinned: false
	---

	# 🌍 Advanced Multilingual Image Describer

	No translation APIs • Native multilingual support • Latest vision-language models

	## 🚀 Features
	- Direct multilingual captioning - No separate translation step
	- Latest models - LLaVA 1.5, Qwen-VL, Moondream 2
	- 10+ languages - Native support for English, Chinese, Amharic, Spanish, French, German, Arabic, and more
	- Fast & efficient - Optimized for Hugging Face Spaces
	- Clean interface - Simple and intuitive

	## 🤖 Supported Models

	### LLaVA 1.5 (7B)
	- Languages: English, Chinese, Spanish, French, German, Italian, Russian, Japanese, Korean, Arabic
	- Best for: High-quality detailed descriptions
	- Size: 7 billion parameters

	### Qwen-VL-Chat
	- Languages: English, Chinese, Japanese, Korean, French, German, Spanish, Russian
	- Best for: Conversational responses
	- Size: 9.6 billion parameters

	### Moondream 2
	- Languages: English, Spanish, French, German
	- Best for: Fast inference, smaller size
	- Size: 1.4 billion parameters

	## 🌐 How It Works

	1. Select a model from the sidebar
	2. Choose language for output
	3. Upload an image (JPG, PNG, WebP)
	4. Click "Generate Description"
	5. Get native description in selected language

	## ⚡ Performance
	- Inference time: 2-10 seconds
	- Memory usage: ~8-16GB RAM
	- Quality: Human-like descriptions
	- Languages: Native output (not translated)

	## 🛠️ Technical Details
	- Framework: Streamlit + Transformers
	- Models: Latest vision-language models from Hugging Face
	- Deployment: Hugging Face Spaces (CPU/GPU)
	- Code: Pure Python, no external APIs
	## 📋 File Structure

	---
	title: Advanced Multilingual Image Describer
	emoji: 🌍
	colorFrom: purple
	colorTo: indigo
	sdk: streamlit
	sdk_version: "1.32.0"
	app_file: app.py
	pinned: false
	---

	# 🌍 Advanced Multilingual Image Describer

	No translation APIs • Native multilingual support • Latest vision-language models

	## 🚀 Features
	- Direct multilingual captioning - No separate translation step
	- Latest models - LLaVA 1.5, Qwen-VL, Moondream 2
	- 10+ languages - Native support for English, Chinese, Amharic, Spanish, French, German, Arabic, and more
	- Fast & efficient - Optimized for Hugging Face Spaces
	- Clean interface - Simple and intuitive

	## 🤖 Supported Models

	### LLaVA 1.5 (7B)
	- Languages: English, Chinese, Spanish, French, German, Italian, Russian, Japanese, Korean, Arabic
	- Best for: High-quality detailed descriptions
	- Size: 7 billion parameters

	### Qwen-VL-Chat
	- Languages: English, Chinese, Japanese, Korean, French, German, Spanish, Russian
	- Best for: Conversational responses
	- Size: 9.6 billion parameters

	### Moondream 2
	- Languages: English, Spanish, French, German
	- Best for: Fast inference, smaller size
	- Size: 1.4 billion parameters

	## 🌐 How It Works

	1. Select a model from the sidebar
	2. Choose language for output
	3. Upload an image (JPG, PNG, WebP)
	4. Click "Generate Description"
	5. Get native description in selected language

	## ⚡ Performance
	- Inference time: 2-10 seconds
	- Memory usage: ~8-16GB RAM
	- Quality: Human-like descriptions
	- Languages: Native output (not translated)

	## 🛠️ Technical Details
	- Framework: Streamlit + Transformers
	- Models: Latest vision-language models from Hugging Face
	- Deployment: Hugging Face Spaces (CPU/GPU)
	- Code: Pure Python, no external APIs
	## 📋 File Structure