nvila-walk-50samples
Fine-tuned NVILA-Lite-2B model for blind assistance navigation.
Model Details
- Base Model: Efficient-Large-Model/NVILA-Lite-2B
- Training Dataset Size: 50 samples
- Train: 35
- Validation: 7
- Test: 8
- Training Date: 2025-12-28
- Run Name: NVILA-2B-Walk-50samples-20251228_211018
Task
Given visual input from a user's forward perspective, generate exactly one short sentence to guide a visually impaired user by:
- Identifying critical obstacles or landmarks
- Describing locations using clock directions (12 o'clock is straight ahead)
- Including relevant details (size, material, distance)
- Giving one clear action
- Prioritizing immediate safety
Example Output
"At 1 o'clock direction there is a tree, be careful to avoid it."
Usage
from llava.model.builder import load_pretrained_model
model_path = "blind-assist/nvila-walk-50samples"
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path)
Training Configuration
- Batch Size: 8 (effective)
- GPUs: 1
- Precision: BF16
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for blind-assist/nvila-walk-50samples
Base model
Efficient-Large-Model/NVILA-Lite-2B