Great model, fast smart and open minded

#2
by bobig - opened

I like this even better than similar heretic models,
seems to have no morals or ethics limiting its thinking.

What is your opinion about Heretic vs Derestricted?

I think Heretic works great on Qwens, GLM has some of those denials in place to be able to think, so the derestricted approach might be cleaner. I don't know the technicals, but usually the metrics tell me a story

GLM-4.5-Air-Derestricted-qx53g        0.402,0.431,0.378,0.687,0.382,0.769,0.699
GLM-4.5-Air-Derestricted-mxfp4        0.413,0.437,0.378,0.690,0.392,0.769,0.715
GLM-4.5-Air-REAP-82B-A12B-mxfp4       0.392,0.422,0.378,0.615,0.368,0.732,0.680
GLM-Steam-106B-A12B-v1-qx65g          0.430,0.461,0.378,0.681,0.398,0.771,0.715

Steam has good ARC because it's Drummer's RP engine--it better have good ARC, and the quant is also almost double the size.

You can see the REAP dip in hellaswag, no surprise, that's the missing experts--other than that it can do the job, sort of. On simple things.

The qx53g is smaller than mxfp4 and matches it on performance.

The boolq at 0.378 is a GLM thing, but that piqa is why it does great at coding

Probably my AI assistants would say something similar, that's just my personal take on this :)

Oh, it's a goodun πŸ₯°

Thanks for the insight, this stuff is gettting complicated. Qwen vs GLM morals & ethics.

Your gpt-oss-120b-heretic-v2-mxfp4-q8-hi-mlx is #1 for 60 GB models.

gpt-oss-120b-Derestricted-mxfp4-mlx has less restrictions but 10% slower tokens.

P-E-W mentioned that newer versions of Heretic might be using ideas from Derestricted, but apparently not yet https://github.com/p-e-w/heretic/releases/tag/v1.1.0

bobig changed discussion status to closed

Yeah, when I tested the heretic models, the v2 was better in metrics--ever so slightly. The gains over the stock model were impressive, just by removing those refusals, the model got smarter. I did not do anything special but to preserve attention to 8 bit and group size 32, which otherwise mlx encoding would stomp it down to group size 64 and the model gets stupid again. There is something to be said about fine tuning a quant ;)

... and yes, you don't need 10 great models, just a good one that does the 90%. Since I am a programmer, that's my go-to ratio :)

Yes, yes, yes.
"The gains over the stock model were impressive, just by removing those refusals"
Bicycle better without training wheels!

"preserve attention to 8 bit and group size 32, which otherwise mlx encoding would stomp it down to group size 64"
Simple when you explain it. 529 models! Hope MLX starts noticing your work.

Sign up or log in to comment