size GGUF

#1
by kalle07 - opened

you can imagine why?

Q8 and Q6 has same size ?
also others...

This is expected as GPT OSS was trained in MXFP4 using quantization aware training so llama.cpp does not convert those FP4 tensors to anything else as doing so would only make the model worse and in the case of Q6/Q8 larger without any benefit. For any GPT OSS based model we provide MXFP4 quants which are the only reasonable choice for those models in my opinion. MXFP4 is superior in terms of quality compared to any other quants provided for GPT OSS based models while being really small and fast to run. We probably shouldn’t even provide any quants larger than MXFP4 for those models as they are pointless.

Sign up or log in to comment