Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

158

Full-text search

Active filters: llm-compressor

kaitchup/LFM2.5-1.2B-JP-W4A16-G128

0.6B • Updated Jan 9 • 5 • 1

kaitchup/LFM2.5-1.2B-JP-NVFP4

0.9B • Updated Jan 9 • 1

kaitchup/LFM2.5-1.2B-Instruct-NVFP4

0.9B • Updated 24 days ago • 79 • 2

kaitchup/LFM2.5-1.2B-Instruct-awq-asym

0.6B • Updated Jan 9 • 261 • 1

kaitchup/LFM2.5-1.2B-JP-awq-asym

0.6B • Updated Jan 9 • 10 • 1

xhxlb/IQuest-Coder-V1-40B-Instruct-int4

Text Generation • 6B • Updated Jan 8 • 2

EmbeddedLLM/Qwen3-VL-30B-A3B-Instruct.w4a16

Text Generation • 5B • Updated Jan 13 • 35

EmbeddedLLM/Qwen3-VL-30B-A3B-Thinking.w4a16

Text Generation • 5B • Updated Jan 13 • 468

inference-optimization/granite-4.0-h-tiny-FP8-block

Text Generation • 7B • Updated 28 days ago • 134

RedHatAI/granite-4.0-h-tiny-FP8-dynamic

Text Generation • 7B • Updated 28 days ago • 409

kaitchup/LFM2.5-1.2B-Thinking-AWQ-W4A16-ASYM

0.6B • Updated 29 days ago • 34

kaitchup/LFM2.5-1.2B-Thinking-FP8-Dynamic

1B • Updated 29 days ago • 191

kaitchup/LFM2.5-1.2B-Thinking-MXFP4

0.8B • Updated 29 days ago • 10

kaitchup/LFM2.5-1.2B-Thinking-NVFP4

0.9B • Updated 24 days ago • 69

kaitchup/LFM2.5-1.2B-Thinking-W4A16-G128

0.6B • Updated 29 days ago • 13

kaitchup/LFM2.5-1.2B-Thinking-autoround-W4A16

0.7B • Updated 29 days ago • 317

kaitchup/GLM-4.7-Flash-FP8-Dynamic

30B • Updated 28 days ago • 167

RedHatAI/Phi-4-reasoning-FP8-dynamic

Text Generation • 15B • Updated 11 days ago • 194

dtometzki/Qwen3-30B-A3B-awq-sym

Text Generation • 5B • Updated 23 days ago • 86

JongYeop/Llama-3.1-8B-Instruct-NVFP4-W4A4

5B • Updated 22 days ago • 10

JongYeop/Llama-3.1-70B-Instruct-NVFP4-W4A4

41B • Updated 18 days ago • 39

JongYeop/Llama-3.1-8B-Instruct-INT8-W8A8

8B • Updated 18 days ago • 21

JongYeop/Llama-3.1-8B-Instruct-INT8-W8A8-Dynamic-Per-Token

8B • Updated 17 days ago • 16

JongYeop/Llama-3.1-8B-Instruct-FP8-W8A8-Dynamic-Per-Token

8B • Updated 17 days ago • 14

rtj1/Qwen2.5-0.5B-AWQ-FP8-Dynamic

Text Generation • 0.6B • Updated 10 days ago • 21

rtj1/Qwen2.5-0.5B-AWQ-FP8-Block

Text Generation • 0.6B • Updated 10 days ago • 20

ludovicoYIN/MiniMax-M2-BF16

Text Generation • 229B • Updated 12 days ago • 87 • 1

vistralis/Qwen3-4B-FP8

Text Generation • 4B • Updated 13 days ago • 11

vistralis/Qwen3-4B-INT8

Text Generation • 4B • Updated 13 days ago • 20

vistralis/Qwen3-8B-INT8

Text Generation • 8B • Updated 13 days ago • 17