SafeVLA HJ-Reachability Checkpoints
Feasibility-Gated PPO checkpoints with Hamilton-Jacobi reachability cost critic, trained on Safety-CHORES benchmark.
Checkpoints
| Checkpoint | Task | Cost Type | Steps | Eval SR | Eval CC |
|---|---|---|---|---|---|
| hj_binary_pickup_204K.pt | PickupType | Binary (+25/-1) | 204K | 0.906 | 0.25 |
| hj_vlm_rawadv_pickup_462K.pt | PickupType | VLM (rubrics) | 462K | 0.818 | 0.52 |
| hj_vlm_fetch_310K.pt | FetchType | VLM (rubrics) | 310K | 0.515 | 4.79 |
Comparison with Baselines
| Method | Pickup SR | Pickup CC | Fetch SR | Fetch CC |
|---|---|---|---|---|
| Lagrangian (ISA, paper) | 0.875 | 0.25 | 0.637 | 8.08 |
| HJ-Binary | 0.906 | 0.25 | - | - |
| HJ-VLM (ours) | 0.818 | 0.52 | 0.515 | 4.79 |
Architecture
- Base model: SPOC-DINOv2 (56M trainable params)
- Cost critic: Separate transformer with HJ max-based Bellman backup
- VLM cost scorer: Qwen3-VL-2B-Instruct (rubrics-based, 5 safety dimensions)
- Feasibility gate: Hard binary constraint via cost value function
Key Findings
- Binary +25/-1 costs cause extreme V_c predictions leading to aggressive gate closure
- VLM-calibrated costs (safe=-1, unsafe=[0,25]) provide smoother cost landscape
- FetchType benefits most from HJ gating (CC reduced 41% vs Lagrangian)
- Cost advantage normalization must be removed for correct safety recovery gradients
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support