size GGUF

by kalle07 - opened about 1 month ago

Discussion

kalle07

about 1 month ago

•

edited about 1 month ago

you can imagine why?

Q8 and Q6 has same size ?
also others...

nicoboss

about 1 month ago

This is expected as GPT OSS was trained in MXFP4 using quantization aware training so llama.cpp does not convert those FP4 tensors to anything else as doing so would only make the model worse and in the case of Q6/Q8 larger without any benefit. For any GPT OSS based model we provide MXFP4 quants which are the only reasonable choice for those models in my opinion. MXFP4 is superior in terms of quality compared to any other quants provided for GPT OSS based models while being really small and fast to run. We probably shouldn’t even provide any quants larger than MXFP4 for those models as they are pointless.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment