nvila-walk-50samples

Fine-tuned NVILA-Lite-2B model for blind assistance navigation.

Model Details

  • Base Model: Efficient-Large-Model/NVILA-Lite-2B
  • Training Dataset Size: 50 samples
    • Train: 35
    • Validation: 7
    • Test: 8
  • Training Date: 2025-12-28
  • Run Name: NVILA-2B-Walk-50samples-20251228_211018

Task

Given visual input from a user's forward perspective, generate exactly one short sentence to guide a visually impaired user by:

  • Identifying critical obstacles or landmarks
  • Describing locations using clock directions (12 o'clock is straight ahead)
  • Including relevant details (size, material, distance)
  • Giving one clear action
  • Prioritizing immediate safety

Example Output

"At 1 o'clock direction there is a tree, be careful to avoid it."

Usage

from llava.model.builder import load_pretrained_model

model_path = "blind-assist/nvila-walk-50samples"
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path)

Training Configuration

  • Batch Size: 8 (effective)
  • GPUs: 1
  • Precision: BF16
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blind-assist/nvila-walk-50samples

Finetuned
(5)
this model

Dataset used to train blind-assist/nvila-walk-50samples