Benchmark: DeepSeek V3 vs GPT-4o vs Claude for coding tasks

#117
by xujfcn - opened

I ran a comparison of DeepSeek V3 against GPT-4o and Claude Sonnet on 50 coding tasks (LeetCode medium/hard). Here are my findings:

Model Pass Rate Avg Time Cost/1K requests
DeepSeek V3 82% 3.2s $0.21
GPT-4o 85% 2.8s $6.25
Claude Sonnet 87% 3.5s $9.00

DeepSeek V3 is remarkably competitive at a fraction of the cost.

Test setup: I used Crazyrouter to run all three models through the same OpenAI-compatible API.

from openai import OpenAI
client = OpenAI(base_url="https://crazyrouter.com/v1", api_key="your-key")
for model in ["deepseek-chat", "gpt-4o", "claude-sonnet-4-20250514"]:
    response = client.chat.completions.create(model=model, messages=messages)

Full comparison: Model Comparison Guide

Sign up or log in to comment