Instructions to use konantech/Konan-LLM-OND with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use konantech/Konan-LLM-OND with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="konantech/Konan-LLM-OND")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("konantech/Konan-LLM-OND")
model = AutoModelForCausalLM.from_pretrained("konantech/Konan-LLM-OND")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use konantech/Konan-LLM-OND with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "konantech/Konan-LLM-OND"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "konantech/Konan-LLM-OND",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/konantech/Konan-LLM-OND

SGLang

How to use konantech/Konan-LLM-OND with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "konantech/Konan-LLM-OND" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "konantech/Konan-LLM-OND",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "konantech/Konan-LLM-OND" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "konantech/Konan-LLM-OND",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use konantech/Konan-LLM-OND with Docker Model Runner:
```
docker model run hf.co/konantech/Konan-LLM-OND
```

errors in eval

by amphora - opened Jul 22, 2025

Discussion

amphora

Jul 22, 2025

Hi, Im guijin the author of KMMLU.

KMMLU, by design, is built to be a four option mcqa benchmark implying that the minimum performance of a model, regardless of how bad it can be, is 25%.

While the readme acknowledges their may be errors in scores for qwen3, we see problematic to report such score. And is advising to fix if possible.

If error persists please contact us so that we may also try resolving together.

Sang-Geun

Konan Technology org Jul 23, 2025

Thank you for your comment.

You’re right—KMMLU is a four-option MCQA benchmark, so scores shouldn’t fall below 25%.

After checking the logs and model card, I found that we evaluated it only in generative mode using the "kmmlu_direct" task in lm-evaluation-harness. This wasn’t clearly stated.

MMLU was run with the same setup (copied from "kmmlu_direct"), so we didn’t use the "mmlu_generative" task there either.

We’ll update the model card as soon as possible. In the meantime, we’ll add a note to avoid confusion.

momo

Konan Technology org Jul 30, 2025

This comment has been hidden (marked as Resolved)

momo changed discussion status to closed Jul 30, 2025

momo changed discussion status to open Jul 30, 2025

momo

Konan Technology org Jul 30, 2025

We would like to inform you that the reevaluation of the model has been completed.

Upon review, we identified the following issues in the previous evaluation:

The inst model was evaluated under a 5-shot setting.
The answers were not properly preprocessed prior to evaluation.

These oversights led to inaccuracies in the KMMLU scores. We sincerely apologize for any confusion this may have caused.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment