--- license: apache-2.0 language: - en metrics: - accuracy tags: - code arxiv: 2407.10424 --- # CodeV:Empowering LLMs for HDL Generation through Multi-Level Summarization CodeV is an innovative series of open-source, instruction-tuned Large Language Models (LLMs) specifically designed for the generation of high-quality HDL code, addressing the challenges faced by existing models in this domain. **(This repo is under development)** ## Models and Datasets | | Base Model | CodeV | | ---- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ | | 6.7B | [deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base) | [yang-z/CodeV-DS-6.7B](https://huggingface.co/yang-z/CodeV-DS-6.7B) | | 7B | [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) | [yang-z/CodeV-CL-7B](https://huggingface.co/yang-z/CodeV-CL-7B) | | 7B | [Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat) | [yang-z/CodeV-QW-7B](https://huggingface.co/yang-z/CodeV-QW-7B) | | 7B | [Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) | [yang-z/CodeV-QC-7B](https://huggingface.co/yang-z/CodeV-QC-7B) | | 6.7B | [deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base) | [yang-z/CodeV-All-DSC](https://huggingface.co/yang-z/CodeV-All-DSC) | | 7B | [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) | [yang-z/CodeV-All-CL](https://huggingface.co/yang-z/CodeV-All-CL) | | 7B |[Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat) | [yang-z/CodeV-All-CQ](https://huggingface.co/yang-z/CodeV-All-CQ) | | 7B |[Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) | [yang-z/CodeV-All-QC](https://huggingface.co/yang-z/CodeV-All-QC) | ## Test If you want to test the generation capability of existing models on Verilog, you need to install the [VerilogEval](https://github.com/NVlabs/verilog-eval) and [RTLLM](https://github.com/hkust-zhiyao/rtllm) environments. ## Quick Start ```python from transformers import pipeline import torch prompt= "FILL IN THE QUESTION" generator = pipeline( model="CODEV", task="text-generation", torch_dtype=torch.bfloat16, device_map="auto", ) result = generator(prompt , max_length=2048, num_return_sequences=1, temperature=0.0) response = result[0]["generated_text"] print("Response:", response) ``` ### Usage Recommendations 1. The template of chat task The goal of the Chat task is to generate complete Verilog or Chisel code from natural language descriptions. The input includes natural language descriptions and optional module headers, while the output is the corresponding HDL code. ``` [Natural Language Description] [Optional Module Header] ``` 2. The template of FIM task The goal of the FIM task is to fill in the missing parts of the code, generating the middle code based on the prefix and suffix. The input includes language tags, prefix, suffix, and special FIM markers, while the output is the missing middle code snippet. ```` [PRE]```[verilog/scala] {prefix}[SUF]{suffix}[MID] ```` It is recommended to use our template during inference. ## Run CodeV-All Models with Twinny The instructions below use `codev-all-qc` as an example. For other models, please make corresponding adjustments. ### Install Ollama Refer to the [official documentation](https://github.com/ollama/ollama/tree/main/docs). ### Import a Model in Ollama #### Create a Modelfile Create a file named `Modelfile` and fill it with the following content: ``` from path/to/codev-all-qc TEMPLATE """{{ .Prompt }}""" PARAMETER stop "```" ``` Replace `path/to/codev-all-qc` with the actual path to your model. You can also customize parameters (e.g., temperature). See the [Modelfile Reference](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) for details. #### Import CodeV-ALL Start the Ollama service: ``` ollama serve ``` Create the model: ``` ollama create codev-all-qc -f path/to/Modelfile ``` Repace `path/to/Modelfile` with the actual path to your Modelfile. Wait for the model creation process to complete. ### **Twinny Setup** #### Install Twinny Open VS Code and install Twinny in the Extensions Marketplace. image-20250912155617922 #### Twinny Configuration Open the FIM Configuration page. 7449b0e6ac2ff722339b7c74f37a8b0e Enter the settings as shown below. The model name should match the one used during `ollama create`. Modify the hostname according to your setup (if Ollama is running on a different node, use that node’s IP address; for local use, use `0.0.0.0`). Click Save. image-20250912160402939 Go to Template Configuration and open the template editor. image-20250912160957699 Open `fim.hbs`, replace its content with the following, and save: ``` <|fim_prefix|>```verilog\n{{{prefix}}}<|fim_suffix|>{{{suffix}}}<|fim_middle|> ``` image-20250912160901631 Finally, ensure the Fim option is checked in the template settings. Note: you may need to re-enable this each time VS Code restarts. bd1fc20b0075656ba4e5321523832e19 #### Try FIM You can now try FIM while writing code in VS Code. Note: The first time you use completion, Ollama will load the model, which may cause a significant delay. image-20250225124004805 ## Paper **Arxiv:** Please cite the paper if you use the models from CodeV. ``` @misc{zhao2025codevempoweringllmshdl, title={CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization}, author={Yang Zhao and Di Huang and Chongxiao Li and Pengwei Jin and Muxin Song and Yinan Xu and Ziyuan Nan and Mingju Gao and Tianyun Ma and Lei Qi and Yansong Pan and Zhenxing Zhang and Rui Zhang and Xishan Zhang and Zidong Du and Qi Guo and Xing Hu}, year={2025}, eprint={2407.10424}, archivePrefix={arXiv}, primaryClass={cs.PL}, url={https://arxiv.org/abs/2407.10424}, } ``` ## Acknowledgements * [Magicoder](https://github.com/ise-uiuc/magicoder): Training code, original datasets and data decontamination * [DeepSeek-Coder](https://github.com/deepseek-ai/DeepSeek-Coder): Base model for CodeV-DeepSeek * [CodeLlama](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/): Base model for CodeLlama * [CodeQwen](https://github.com/QwenLM/CodeQwen1.5): CodeV-CodeQwen