Instructions to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF", filename="DeepSeek-R1-Distill-Qwen-14B_Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- Ollama
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Ollama:
ollama run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- Unsloth Studio
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting
- Docker Model Runner
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Docker Model Runner:
docker model run hf.co/SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- Lemonade
How to use SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.DeepSeek-R1-Distill-Qwen-14B-GGUF-Q4_K_M
List all available models
lemonade list
DeepSeek-R1-Distill-Qwen-14B
DeepSeek-R1-Distill-Qwen-14B is a reasoning-focused large language model distilled from the DeepSeek-R1 system into a Qwen2.5-14B backbone. It is optimized for structured reasoning, step-by-step problem solving, and instruction-following across complex analytical tasks.
The model is designed to deliver strong logical consistency and improved reasoning efficiency while maintaining the conversational and multilingual strengths of the Qwen architecture. It is suitable for research, experimentation, and production environments requiring reliable reasoning and long-form generation.
Model Overview
- Model Name: DeepSeek-R1-Distill-Qwen-14B
- Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- Architecture: Decoder-only Transformer
- Parameter Count: 14 Billion
- Context Window: Implementation dependent
- Modalities: Text
- Primary Languages: English, Chinese
- Developer: DeepSeek AI
- License: mit
Quantization Details
Q4_K_M
- Approx. ~71% size reduction (8.37 GB)
- Significant size reduction for efficient deployment
- Lower memory requirements for CPU and limited-VRAM GPUs
- Faster inference and token generation
- Slight reduction in reasoning precision for complex multi-step problems
Q5_K_M
- Approx. ~66% size reduction (9.79 GB)
- Higher fidelity to the original model
- Improved reasoning stability and coherence
- Larger memory footprint than Q4 variants
- Recommended when performance is prioritized over minimal resource usage
Training Overview
Pretraining
The underlying base model is trained on a large multilingual corpus including web data, code, structured documents, and academic material. Training emphasizes language understanding, long-range context modeling, and knowledge representation.
Reasoning Distillation
This model is further refined through knowledge distillation from a stronger reasoning model (DeepSeek-R1). Distillation focuses on transferring:
- Step-by-step problem solving strategies
- Logical decomposition of complex tasks
- Structured reasoning traces
- Improved mathematical and analytical performance
This model is built to enhance reasoning performance through distillation from a stronger reasoning system. Key design priorities include:
- High-quality step-by-step reasoning
- Strong logical consistency across multi-stage problems
- Reliable instruction following
- Efficient reasoning with reduced model size
- Stable multi-turn conversational behavior
- Structured and interpretable outputs
Core Capabilities
Advanced reasoning Performs multi-step logical analysis and structured problem solving.
Instruction adherence Executes complex prompts and detailed task specifications.
Extended context processing Maintains coherence across long inputs and multi-turn interactions.
Multilingual interaction Supports multiple languages with strong English and Chinese performance.
Structured output generation Produces organized responses such as stepwise solutions, lists, and formatted data.
Conversational consistency Maintains logical continuity across dialogue sessions.
Example Usage
llama.cpp
./llama-cli \
-m DeepSeek-R1-Distill-Qwen-14B_Q4_K_M.gguf \
-p "Explain how gradient descent works step by step."
Recommended Use Cases
- Mathematical reasoning and problem solving
- Scientific and technical explanation
- Research assistance and analysis
- Programming and algorithm design
- Educational tutoring and step-by-step instruction
- Long-form structured content generation
Acknowledgments
These quantized models are based on the original work by deepseek-ai development team.
Special thanks to:
The deepseek-ai team for developing and releasing the deepseek-ai/DeepSeek-R1-Distill-Qwen-14B model.
Georgi Gerganov and the entire
llama.cppopen-source community for enabling efficient model quantization and inference via the GGUF format.
Contact
For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.
- Downloads last month
- 47
4-bit
5-bit
Model tree for SandLogicTechnologies/DeepSeek-R1-Distill-Qwen-14B-GGUF
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B