Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational

Does not work with dgx spark

#13
by sotaaa - opened

Tried to follow https://build.nvidia.com/spark/sglang/instructions to run a sglang docker on dgx spark, then ran the model using the provided command in model card. Got this error

launch_server.py: error: argument --reasoning-parser: invalid choice: 'nano_v3' (choose from 'deepseek-r1', 'deepseek-v3', 'glm45', 'gpt-oss', 'kimi', 'qwen3', 'qwen3-thinking', 'minimax', 'minimax-append-think', 'step3')

yes would be awesome to have it work on dgx spark

Same issue here on DGX Spark.

Environment:

  • DGX Spark, GB10 GPU, CUDA 13.0.1
  • Model downloaded: /home/data/models/nemotron-3-nano-bf16 (all 13 safetensors verified)

Tried:

  1. lmsysorg/sglang:spark β†’ nano_v3 parser not available (same error as OP)
  2. nvcr.io/nvidia/vllm:25.11-py3 β†’ AttributeError: 'NemotronHConfig' object has no attribute 'rms_norm_eps'

Question:
Which exact docker image + tag supports Nemotron-3-Nano on DGX Spark today?

To try it on DGX Spark now try the following steps:

  • Via lmstudio.ai https://lmstudio.ai/ I tried Q4_K_M (24.5GB) GGUF and everything "just worked".
  • Trying few queries produced reasonable result and about 65 tok/sec which is higher what I get (53 tok/sec) on MacBook M3 Pro with MLX variant.
  • It is able to use built-in (to lmstudio.ai) tool: js-sandbox even though we did not train on that specific tool.

I did not measure any accuracies and this GGUF isn't "official" NVIDIA GGUF. Many thanks to the OSS community for providing these!

image

NVIDIA org

Hi @sotaaa @PhotosGrafus

We have two options confirmed for DGX Spark. We're looking into SGLang support.

vLLM path is a little bit tricky as the user needs to build a Docker image by themselves. We'll keep posted about the latest information about Nemotron 3 Nano for DGX Spark.

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ DOCKER_BUILDKIT=1 docker build \
    --build-arg max_jobs=32 \    # Decrease the number if you face an OOM issue
    --build-arg RUN_WHEEL_CHECK=false \
    --build-arg CUDA_VERSION=13.0.1 \
    --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 \
    --build-arg torch_cuda_arch_list='12.1' \
    --platform "linux/arm64" \
    --tag <docker-image-tag-name> \
    --target vllm-openai \
    --progress plain \
    -f docker/Dockerfile \
.

@suhara Thank you for the build instructions.

I need to run the BF16 full precision model (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16), not quantized.

Will this custom vLLM build support BF16 on DGX Spark?

Sign up or log in to comment