Instructions to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF",
	filename="DeepSeek-R1-Distill-Qwen-14B_Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M

Ollama
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Ollama:
```
ollama run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
```

Unsloth Studio

How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting

Docker Model Runner
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Docker Model Runner:
```
docker model run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
```

Lemonade

How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.DeepSeek-R1-Distill-Qwen-14B-GGUF-Q4_K_M

List all available models

lemonade list

DeepSeek-R1-Distill-Qwen-14B

DeepSeek-R1-Distill-Qwen-14B is a reasoning-focused large language model distilled from the DeepSeek-R1 system into a Qwen2.5-14B backbone. It is optimized for structured reasoning, step-by-step problem solving, and instruction-following across complex analytical tasks.

The model is designed to deliver strong logical consistency and improved reasoning efficiency while maintaining the conversational and multilingual strengths of the Qwen architecture. It is suitable for research, experimentation, and production environments requiring reliable reasoning and long-form generation.

Model Overview

Model Name: DeepSeek-R1-Distill-Qwen-14B
Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
Architecture: Decoder-only Transformer
Parameter Count: 14 Billion
Context Window: Implementation dependent
Modalities: Text
Primary Languages: English, Chinese
Developer: DeepSeek AI
License: mit

Quantization Details

Q4_K_M

Approx. ~71% size reduction (8.37 GB)
Significant size reduction for efficient deployment
Lower memory requirements for CPU and limited-VRAM GPUs
Faster inference and token generation
Slight reduction in reasoning precision for complex multi-step problems

Q5_K_M

Approx. ~66% size reduction (9.79 GB)
Higher fidelity to the original model
Improved reasoning stability and coherence
Larger memory footprint than Q4 variants
Recommended when performance is prioritized over minimal resource usage

Training Overview

Pretraining

The underlying base model is trained on a large multilingual corpus including web data, code, structured documents, and academic material. Training emphasizes language understanding, long-range context modeling, and knowledge representation.

Reasoning Distillation

This model is further refined through knowledge distillation from a stronger reasoning model (DeepSeek-R1). Distillation focuses on transferring:

Step-by-step problem solving strategies
Logical decomposition of complex tasks
Structured reasoning traces
Improved mathematical and analytical performance

This model is built to enhance reasoning performance through distillation from a stronger reasoning system. Key design priorities include:

High-quality step-by-step reasoning
Strong logical consistency across multi-stage problems
Reliable instruction following
Efficient reasoning with reduced model size
Stable multi-turn conversational behavior
Structured and interpretable outputs

Core Capabilities

Advanced reasoning Performs multi-step logical analysis and structured problem solving.
Instruction adherence Executes complex prompts and detailed task specifications.
Extended context processing Maintains coherence across long inputs and multi-turn interactions.
Multilingual interaction Supports multiple languages with strong English and Chinese performance.
Structured output generation Produces organized responses such as stepwise solutions, lists, and formatted data.
Conversational consistency Maintains logical continuity across dialogue sessions.

Example Usage

llama.cpp

./llama-cli \
  -m DeepSeek-R1-Distill-Qwen-14B_Q4_K_M.gguf \
  -p "Explain how gradient descent works step by step."

Recommended Use Cases

Mathematical reasoning and problem solving
Scientific and technical explanation
Research assistance and analysis
Programming and algorithm design
Educational tutoring and step-by-step instruction
Long-form structured content generation

Acknowledgments

These quantized models are based on the original work by deepseek-ai development team.

Special thanks to:

The deepseek-ai team for developing and releasing the deepseek-ai/DeepSeek-R1-Distill-Qwen-14B model.
Georgi Gerganov and the entire llama.cpp open-source community for enabling efficient model quantization and inference via the GGUF format.

Contact

For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.

Downloads last month: 47

GGUF

Model size

15B params

Architecture

qwen2

Hardware compatibility

4-bit

5-bit

Model tree for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Quantized

(133)

this model