Benchmark: DeepSeek V3 vs GPT-4o vs Claude for coding tasks
#117
by
xujfcn - opened
I ran a comparison of DeepSeek V3 against GPT-4o and Claude Sonnet on 50 coding tasks (LeetCode medium/hard). Here are my findings:
| Model | Pass Rate | Avg Time | Cost/1K requests |
|---|---|---|---|
| DeepSeek V3 | 82% | 3.2s | $0.21 |
| GPT-4o | 85% | 2.8s | $6.25 |
| Claude Sonnet | 87% | 3.5s | $9.00 |
DeepSeek V3 is remarkably competitive at a fraction of the cost.
Test setup: I used Crazyrouter to run all three models through the same OpenAI-compatible API.
from openai import OpenAI
client = OpenAI(base_url="https://crazyrouter.com/v1", api_key="your-key")
for model in ["deepseek-chat", "gpt-4o", "claude-sonnet-4-20250514"]:
response = client.chat.completions.create(model=model, messages=messages)
Full comparison: Model Comparison Guide