Benchmark: DeepSeek V3 vs GPT-4o vs Claude for coding tasks
#117 opened 2 days ago
by
xujfcn
Add MMLU-Pro evaluation result (64.4)
1
#116 opened 29 days ago
by
burtenshaw
Add GSM8K evaluation result
1
#115 opened about 1 month ago
by
burtenshaw
Add GSM8K evaluation result (89.3%)
1
#114 opened about 1 month ago
by
burtenshaw
Add GSM8K evaluation result
1
#113 opened about 1 month ago
by
burtenshaw
Add GSM8K evaluation result
1
#112 opened about 1 month ago
by
burtenshaw
Production deployment considerations
3
#111 opened 2 months ago
by
Cagnicolas
dememe4301
1
#110 opened 3 months ago
by
kubilayarikan
Update inference/model.py
1
#109 opened 3 months ago
by
Crossberry
Update README.md
1
#107 opened 7 months ago
by
reactkick
Remove redundant code
1
#106 opened 8 months ago
by
GloomScythe
MTP Integration: Unexpectedly High Loss with Loaded Weights
1
#105 opened 8 months ago
by
parambole
add AIBOM
👍 1
1
#104 opened 9 months ago
by
RiccardoDav
Update tokenizer_config.json
1
#101 opened 10 months ago
by
Akshay47
DeepSeek V3 model Bad Cases Genuine User Open Reviews and Comments Collection
1
#99 opened 10 months ago
by
DeepNLP
Make config params float to avoid warning in Transformers
1
#97 opened 11 months ago
by
Rocketknight1
Point to latest checkpoint
1
#96 opened 11 months ago
by
victor
how to convert model to bf16
1
#95 opened 11 months ago
by
Saicy
Update README.md
1
#94 opened 12 months ago
by
Alirezaaa123456
Deepseek V3
1
#93 opened 12 months ago
by
cybercyb
【Q】shared_head weights of MTP
👀 5
1
#92 opened 12 months ago
by
huang11
fix for transformers 4.49 compatibility.
2
#91 opened 12 months ago
by
katuni4ka
Update README.md
1
#90 opened 12 months ago
by
baishihao
无辅助损失专家偏置代码实现的小问题 A Small Issue in the Code Implementation of Auxiliary-Loss-Free Load Balancing Expert Bias
1
#89 opened 12 months ago
by
liyang31163150
Fix generation with latest transformers
1
#88 opened about 1 year ago
by
kylesayrs
Add pipeline tag
1
#86 opened about 1 year ago
by
nielsr
Some of the safetensor files are not marked as safe
1
#85 opened about 1 year ago
by
tanmaylaud
Update README.md
1
#84 opened about 1 year ago
by
MTayira
ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float8_e4m3fn
1
#82 opened about 1 year ago
by
ajtakto
Update README.md
1
#81 opened about 1 year ago
by
deleted
Update README.md
1
#80 opened about 1 year ago
by
zhup
Update README.md
1
#79 opened about 1 year ago
by
zhup
chat
1
#77 opened about 1 year ago
by
rojithonline
DeepSeek-V3-lite naming conventions?
❤️ 1
7
#76 opened about 1 year ago
by
AlphaGaO
torch.distributed.DistNetworkError
1
#75 opened about 1 year ago
by
yu19920006607
remove reference to deprecated transformers code
3
#74 opened about 1 year ago
by
winglian
Update README.md
1
#73 opened about 1 year ago
by
SamimSaikia
DeepSeek R1 answer ChatGPT ??
😔 1
5
#72 opened about 1 year ago
by
valerebron
ValueError: Unrecognized configuration class <class 'transformers_modules.configuration_deepseek.DeepseekV3Config'> to build an AutoTokenizer.
12
#69 opened about 1 year ago
by
ajtakto
Paralelized script
1
#67 opened about 1 year ago
by
ajtakto
I am getting an error message while executing pip install - r requirements. txt
6
#64 opened about 1 year ago
by
yu19920006607
`aux_loss_alpha` should be 1e-4 instead of 1e-3?
#61 opened about 1 year ago
by
cuichenx
captcha not loading on edge
#60 opened about 1 year ago
by
leo-smi
Upload shreya.zip
#59 opened about 1 year ago
by
Msdthala
Upload IMG_20250111_184317.jpg
#58 opened about 1 year ago
by
Sajalhero
无辅助损失的专家路由
2
#56 opened about 1 year ago
by
qing9
AI Games
#55 opened about 1 year ago
by
ChickenUJHAYIUSGU
Upload IMG_0509 4.HEIC
#54 opened about 1 year ago
by
borhanrabbany