Instructions to use konantech/Konan-LLM-OND with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use konantech/Konan-LLM-OND with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="konantech/Konan-LLM-OND") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("konantech/Konan-LLM-OND") model = AutoModelForCausalLM.from_pretrained("konantech/Konan-LLM-OND") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use konantech/Konan-LLM-OND with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "konantech/Konan-LLM-OND" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "konantech/Konan-LLM-OND", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/konantech/Konan-LLM-OND
- SGLang
How to use konantech/Konan-LLM-OND with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "konantech/Konan-LLM-OND" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "konantech/Konan-LLM-OND", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "konantech/Konan-LLM-OND" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "konantech/Konan-LLM-OND", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use konantech/Konan-LLM-OND with Docker Model Runner:
docker model run hf.co/konantech/Konan-LLM-OND
errors in eval
Hi, Im guijin the author of KMMLU.
KMMLU, by design, is built to be a four option mcqa benchmark implying that the minimum performance of a model, regardless of how bad it can be, is 25%.
While the readme acknowledges their may be errors in scores for qwen3, we see problematic to report such score. And is advising to fix if possible.
If error persists please contact us so that we may also try resolving together.
Thank you for your comment.
You’re right—KMMLU is a four-option MCQA benchmark, so scores shouldn’t fall below 25%.
After checking the logs and model card, I found that we evaluated it only in generative mode using the "kmmlu_direct" task in lm-evaluation-harness. This wasn’t clearly stated.
MMLU was run with the same setup (copied from "kmmlu_direct"), so we didn’t use the "mmlu_generative" task there either.
We’ll update the model card as soon as possible. In the meantime, we’ll add a note to avoid confusion.
We would like to inform you that the reevaluation of the model has been completed.
Upon review, we identified the following issues in the previous evaluation:
- The inst model was evaluated under a 5-shot setting.
- The answers were not properly preprocessed prior to evaluation.
These oversights led to inaccuracies in the KMMLU scores. We sincerely apologize for any confusion this may have caused.