Does not work with dgx spark
Tried to follow https://build.nvidia.com/spark/sglang/instructions to run a sglang docker on dgx spark, then ran the model using the provided command in model card. Got this error
launch_server.py: error: argument --reasoning-parser: invalid choice: 'nano_v3' (choose from 'deepseek-r1', 'deepseek-v3', 'glm45', 'gpt-oss', 'kimi', 'qwen3', 'qwen3-thinking', 'minimax', 'minimax-append-think', 'step3')
yes would be awesome to have it work on dgx spark
Same issue here on DGX Spark.
Environment:
- DGX Spark, GB10 GPU, CUDA 13.0.1
- Model downloaded: /home/data/models/nemotron-3-nano-bf16 (all 13 safetensors verified)
Tried:
lmsysorg/sglang:sparkβnano_v3parser not available (same error as OP)nvcr.io/nvidia/vllm:25.11-py3βAttributeError: 'NemotronHConfig' object has no attribute 'rms_norm_eps'
Question:
Which exact docker image + tag supports Nemotron-3-Nano on DGX Spark today?
To try it on DGX Spark now try the following steps:
- Via lmstudio.ai https://lmstudio.ai/ I tried Q4_K_M (24.5GB) GGUF and everything "just worked".
- Trying few queries produced reasonable result and about 65 tok/sec which is higher what I get (53 tok/sec) on MacBook M3 Pro with MLX variant.
- It is able to use built-in (to lmstudio.ai) tool: js-sandbox even though we did not train on that specific tool.
I did not measure any accuracies and this GGUF isn't "official" NVIDIA GGUF. Many thanks to the OSS community for providing these!
We have two options confirmed for DGX Spark. We're looking into SGLang support.
vLLM path is a little bit tricky as the user needs to build a Docker image by themselves. We'll keep posted about the latest information about Nemotron 3 Nano for DGX Spark.
- (1) Llama.cpp (used as the backend for LM Studio as @okuchaiev mentioned above)
- (2) vLLM
- https://github.com/zhenghax/recipes/blob/main/NVIDIA/Nemotron-3-Nano-30B-A3B.md#run-docker-container-on-dgx-spark
- The docker build command information is currently missing in the documentation.
v0.12.0or newer is needed. Pull vLLM v0.12.0 or later, then build the Docker image using the following command.
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ DOCKER_BUILDKIT=1 docker build \
--build-arg max_jobs=32 \ # Decrease the number if you face an OOM issue
--build-arg RUN_WHEEL_CHECK=false \
--build-arg CUDA_VERSION=13.0.1 \
--build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 \
--build-arg torch_cuda_arch_list='12.1' \
--platform "linux/arm64" \
--tag <docker-image-tag-name> \
--target vllm-openai \
--progress plain \
-f docker/Dockerfile \
.
@suhara Thank you for the build instructions.
I need to run the BF16 full precision model (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16), not quantized.
Will this custom vLLM build support BF16 on DGX Spark?
