Does not work with dgx spark

#13

by sotaaa - opened 19 days ago

19 days ago

Tried to follow https://build.nvidia.com/spark/sglang/instructions to run a sglang docker on dgx spark, then ran the model using the provided command in model card. Got this error

launch_server.py: error: argument --reasoning-parser: invalid choice: 'nano_v3' (choose from 'deepseek-r1', 'deepseek-v3', 'glm45', 'gpt-oss', 'kimi', 'qwen3', 'qwen3-thinking', 'minimax', 'minimax-append-think', 'step3')

julien-c

19 days ago

yes would be awesome to have it work on dgx spark

PhotosGrafus

18 days ago

Same issue here on DGX Spark.

Environment:

DGX Spark, GB10 GPU, CUDA 13.0.1
Model downloaded: /home/data/models/nemotron-3-nano-bf16 (all 13 safetensors verified)

Tried:

lmsysorg/sglang:spark → nano_v3 parser not available (same error as OP)
nvcr.io/nvidia/vllm:25.11-py3 → AttributeError: 'NemotronHConfig' object has no attribute 'rms_norm_eps'

Question:
Which exact docker image + tag supports Nemotron-3-Nano on DGX Spark today?

okuchaiev

NVIDIA org 17 days ago

•

edited 17 days ago

To try it on DGX Spark now try the following steps:

Via lmstudio.ai https://lmstudio.ai/ I tried Q4_K_M (24.5GB) GGUF and everything "just worked".
Trying few queries produced reasonable result and about 65 tok/sec which is higher what I get (53 tok/sec) on MacBook M3 Pro with MLX variant.
It is able to use built-in (to lmstudio.ai) tool: js-sandbox even though we did not train on that specific tool.

I did not measure any accuracies and this GGUF isn't "official" NVIDIA GGUF. Many thanks to the OSS community for providing these!

suhara

NVIDIA org 16 days ago

Hi @sotaaa @PhotosGrafus

We have two options confirmed for DGX Spark. We're looking into SGLang support.

vLLM path is a little bit tricky as the user needs to build a Docker image by themselves. We'll keep posted about the latest information about Nemotron 3 Nano for DGX Spark.

(1) Llama.cpp (used as the backend for LM Studio as @okuchaiev mentioned above)
- https://docs.unsloth.ai/models/nemotron-3#run-nemotron-3-nano-30b-a3b
(2) vLLM
- https://github.com/zhenghax/recipes/blob/main/NVIDIA/Nemotron-3-Nano-30B-A3B.md#run-docker-container-on-dgx-spark
- The docker build command information is currently missing in the documentation. v0.12.0 or newer is needed. Pull vLLM v0.12.0 or later, then build the Docker image using the following command.

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ DOCKER_BUILDKIT=1 docker build \
    --build-arg max_jobs=32 \    # Decrease the number if you face an OOM issue
    --build-arg RUN_WHEEL_CHECK=false \
    --build-arg CUDA_VERSION=13.0.1 \
    --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 \
    --build-arg torch_cuda_arch_list='12.1' \
    --platform "linux/arm64" \
    --tag <docker-image-tag-name> \
    --target vllm-openai \
    --progress plain \
    -f docker/Dockerfile \
.

PhotosGrafus

12 days ago

@suhara Thank you for the build instructions.

I need to run the BF16 full precision model (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16), not quantized.

Will this custom vLLM build support BF16 on DGX Spark?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment