Instructions to use huihui-ai/dots.llm1.inst with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use huihui-ai/dots.llm1.inst with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="huihui-ai/dots.llm1.inst")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("huihui-ai/dots.llm1.inst", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use huihui-ai/dots.llm1.inst with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "huihui-ai/dots.llm1.inst" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "huihui-ai/dots.llm1.inst", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/huihui-ai/dots.llm1.inst
- SGLang
How to use huihui-ai/dots.llm1.inst with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "huihui-ai/dots.llm1.inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "huihui-ai/dots.llm1.inst", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "huihui-ai/dots.llm1.inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "huihui-ai/dots.llm1.inst", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use huihui-ai/dots.llm1.inst with Docker Model Runner:
docker model run hf.co/huihui-ai/dots.llm1.inst
huihui-ai/dots.llm1.inst
This version only allows local loading of rednote-hilab/dots.llm1.inst using transformers, with only the local import issue modified and no other changes.
Usage
Copy the four files to the model directory, and then you can use the following program.
import sys
import os
import torch
from transformers import AutoTokenizer, AutoConfig, AutoModel, BitsAndBytesConfig
MODEL_ID = "./rednote-hilab/dots.llm1.inst"
sys.path.append(os.path.abspath(MODEL_ID))
from configuration_dots1 import Dots1Config
from modeling_dots1 import Dots1ForCausalLM
AutoConfig.register("dots1", Dots1Config)
AutoModel.register(Dots1Config, Dots1ForCausalLM)
config = AutoConfig.from_pretrained(MODEL_ID)
print(config)
quant_config_4 = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
llm_int8_enable_fp32_cpu_offload=True,
)
model = Dots1ForCausalLM.from_pretrained(
MODEL_ID,
device_map="auto",
trust_remote_code=True,
quantization_config=quant_config_4,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
)
print(model)
print(model.config)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
Model tree for huihui-ai/dots.llm1.inst
Base model
rednote-hilab/dots.llm1.base Finetuned
rednote-hilab/dots.llm1.inst