nvila-walk-130samples

Fine-tuned NVILA-Lite-2B model for blind assistance navigation.

Model Details

  • Base Model: Efficient-Large-Model/NVILA-Lite-2B
  • Training Dataset Size: 130 samples
    • Train: 91
    • Validation: 19
    • Test: 20
  • Training Date: 2025-12-28
  • Run Name: NVILA-2B-Walk-130samples-20251228_200126

Task

Given visual input from a user's forward perspective, generate exactly one short sentence to guide a visually impaired user by:

  • Identifying critical obstacles or landmarks
  • Describing locations using clock directions (12 o'clock is straight ahead)
  • Including relevant details (size, material, distance)
  • Giving one clear action
  • Prioritizing immediate safety

Example Output

"At 1 o'clock direction there is a tree, be careful to avoid it."

Usage

from llava.model.builder import load_pretrained_model

model_path = "blind-assist/nvila-walk-130samples"
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path)

Training Configuration

  • Batch Size: 8 (effective)
  • GPUs: 1
  • Precision: BF16
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blind-assist/nvila-walk-130samples

Finetuned
(6)
this model

Dataset used to train blind-assist/nvila-walk-130samples