SafeVLA HJ-Reachability Checkpoints

Feasibility-Gated PPO checkpoints with Hamilton-Jacobi reachability cost critic, trained on Safety-CHORES benchmark.

Checkpoints

Checkpoint Task Cost Type Steps Eval SR Eval CC
hj_binary_pickup_204K.pt PickupType Binary (+25/-1) 204K 0.906 0.25
hj_vlm_rawadv_pickup_462K.pt PickupType VLM (rubrics) 462K 0.818 0.52
hj_vlm_fetch_310K.pt FetchType VLM (rubrics) 310K 0.515 4.79

Comparison with Baselines

Method Pickup SR Pickup CC Fetch SR Fetch CC
Lagrangian (ISA, paper) 0.875 0.25 0.637 8.08
HJ-Binary 0.906 0.25 - -
HJ-VLM (ours) 0.818 0.52 0.515 4.79

Architecture

  • Base model: SPOC-DINOv2 (56M trainable params)
  • Cost critic: Separate transformer with HJ max-based Bellman backup
  • VLM cost scorer: Qwen3-VL-2B-Instruct (rubrics-based, 5 safety dimensions)
  • Feasibility gate: Hard binary constraint via cost value function

Key Findings

  • Binary +25/-1 costs cause extreme V_c predictions leading to aggressive gate closure
  • VLM-calibrated costs (safe=-1, unsafe=[0,25]) provide smoother cost landscape
  • FetchType benefits most from HJ gating (CC reduced 41% vs Lagrangian)
  • Cost advantage normalization must be removed for correct safety recovery gradients
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support