File size: 7,585 Bytes
0ecc4db 89dd0b7 d77d6fb c0bc42c 0ecc4db |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
---
license: apache-2.0
language:
- en
metrics:
- accuracy
tags:
- code
arxiv: 2407.10424
---
# CodeV:Empowering LLMs for HDL Generation through Multi-Level Summarization
<img src="assets/overview_v20250413.png" style="zoom:50%;" />
CodeV is an innovative series of open-source, instruction-tuned Large Language Models (LLMs) specifically designed for the generation of high-quality HDL code, addressing the challenges faced by existing models in this domain. **(This repo is under development)**
## Models and Datasets
| | Base Model | CodeV |
| ---- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| 6.7B | [deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base) | [yang-z/CodeV-DS-6.7B](https://huggingface.co/yang-z/CodeV-DS-6.7B) |
| 7B | [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) | [yang-z/CodeV-CL-7B](https://huggingface.co/yang-z/CodeV-CL-7B) |
| 7B | [Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat) | [yang-z/CodeV-QW-7B](https://huggingface.co/yang-z/CodeV-QW-7B) |
| 7B | [Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) | [yang-z/CodeV-QC-7B](https://huggingface.co/yang-z/CodeV-QC-7B) |
| 6.7B | [deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base) | [yang-z/CodeV-All-DSC](https://huggingface.co/yang-z/CodeV-All-DSC) |
| 7B | [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) | [yang-z/CodeV-All-CL](https://huggingface.co/yang-z/CodeV-All-CL) |
| 7B |[Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat) | [yang-z/CodeV-All-CQ](https://huggingface.co/yang-z/CodeV-All-CQ) |
| 7B |[Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) | [yang-z/CodeV-All-QC](https://huggingface.co/yang-z/CodeV-All-QC) |
## Test
If you want to test the generation capability of existing models on Verilog, you need to install the [VerilogEval](https://github.com/NVlabs/verilog-eval) and [RTLLM](https://github.com/hkust-zhiyao/rtllm) environments.
## Quick Start
```python
from transformers import pipeline
import torch
prompt= "FILL IN THE QUESTION"
generator = pipeline(
model="CODEV",
task="text-generation",
torch_dtype=torch.bfloat16,
device_map="auto",
)
result = generator(prompt , max_length=2048, num_return_sequences=1, temperature=0.0)
response = result[0]["generated_text"]
print("Response:", response)
```
### Usage Recommendations
1. The template of chat task
The goal of the Chat task is to generate complete Verilog or Chisel code from natural language descriptions. The input includes natural language descriptions and optional module headers, while the output is the corresponding HDL code.
```
<LanguageTag>
[Natural Language Description]
[Optional Module Header]
```
2. The template of FIM task
The goal of the FIM task is to fill in the missing parts of the code, generating the middle code based on the prefix and suffix. The input includes language tags, prefix, suffix, and special FIM markers, while the output is the missing middle code snippet.
````
[PRE]```[verilog/scala]
<LanguageTag>
{prefix}[SUF]{suffix}[MID]
````
It is recommended to use our template during inference.
## Run CodeV-All Models with Twinny
The instructions below use `codev-all-qc` as an example. For other models, please make corresponding adjustments.
### Install Ollama
Refer to the [official documentation](https://github.com/ollama/ollama/tree/main/docs).
### Import a Model in Ollama
#### Create a Modelfile
Create a file named `Modelfile` and fill it with the following content:
```
from path/to/codev-all-qc
TEMPLATE """{{ .Prompt }}"""
PARAMETER stop "```"
```
Replace `path/to/codev-all-qc` with the actual path to your model. You can also customize parameters (e.g., temperature). See the [Modelfile Reference](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) for details.
#### Import CodeV-ALL
Start the Ollama service:
```
ollama serve
```
Create the model:
```
ollama create codev-all-qc -f path/to/Modelfile
```
Repace `path/to/Modelfile` with the actual path to your Modelfile. Wait for the model creation process to complete.
### **Twinny Setup**
#### Install Twinny
Open VS Code and install Twinny in the Extensions Marketplace.
<img src="./assets/image-20250912155617922.png" alt="image-20250912155617922" style="zoom: 35%;" />
#### Twinny Configuration
Open the FIM Configuration page.
<img src="./assets/7449b0e6ac2ff722339b7c74f37a8b0e.png" alt="7449b0e6ac2ff722339b7c74f37a8b0e" style="zoom:33%;" />
Enter the settings as shown below. The model name should match the one used during `ollama create`. Modify the hostname according to your setup (if Ollama is running on a different node, use that node’s IP address; for local use, use `0.0.0.0`). Click Save.
<img src="./assets/image-20250912160402939.png" alt="image-20250912160402939" style="zoom: 35%;" />
Go to Template Configuration and open the template editor.
<img src="./assets/image-20250912160957699.png" alt="image-20250912160957699" style="zoom: 35%;" />
Open `fim.hbs`, replace its content with the following, and save:
```
<|fim_prefix|>```verilog\n<verilog>{{{prefix}}}<|fim_suffix|>{{{suffix}}}<|fim_middle|>
```
<img src="./assets/image-20250912160901631.png" alt="image-20250912160901631" style="zoom: 33%;" />
Finally, ensure the Fim option is checked in the template settings. Note: you may need to re-enable this each time VS Code restarts.
<img src="./assets/bd1fc20b0075656ba4e5321523832e19.png" alt="bd1fc20b0075656ba4e5321523832e19" style="zoom:35%;" />
#### Try FIM
You can now try FIM while writing code in VS Code. Note: The first time you use completion, Ollama will load the model, which may cause a significant delay.
<img src="./assets/image-20250225124004805.png" alt="image-20250225124004805" style="zoom: 67%;" />
## Paper
**Arxiv:** <https://arxiv.org/abs/2407.10424>
Please cite the paper if you use the models from CodeV.
```
@misc{zhao2025codevempoweringllmshdl,
title={CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization},
author={Yang Zhao and Di Huang and Chongxiao Li and Pengwei Jin and Muxin Song and Yinan Xu and Ziyuan Nan and Mingju Gao and Tianyun Ma and Lei Qi and Yansong Pan and Zhenxing Zhang and Rui Zhang and Xishan Zhang and Zidong Du and Qi Guo and Xing Hu},
year={2025},
eprint={2407.10424},
archivePrefix={arXiv},
primaryClass={cs.PL},
url={https://arxiv.org/abs/2407.10424},
}
```
## Acknowledgements
* [Magicoder](https://github.com/ise-uiuc/magicoder): Training code, original datasets and data decontamination
* [DeepSeek-Coder](https://github.com/deepseek-ai/DeepSeek-Coder): Base model for CodeV-DeepSeek
* [CodeLlama](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/): Base model for CodeLlama
* [CodeQwen](https://github.com/QwenLM/CodeQwen1.5): CodeV-CodeQwen |