🤝 Open to Collab

1 9 23

Danny

TheDrunkenSnail

AI & ML interests

None yet

Recent Activity

upvoted a changelog 13 days ago

Filter Leaderboards by Model Size

upvoted a changelog 25 days ago

Filter Models page by Base Models only

reacted to salma-remyx's post with 🔥 about 1 month ago

The space of possible improvements for your AI model is large while evaluation is costly. So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback." The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives. Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth. The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis. Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter. So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation arXiv: https://arxiv.org/pdf/2510.17671 Substack: https://remyxai.substack.com/p/lilo-and-myx VQASynth: https://github.com/remyxai/VQASynth

View all activity

Organizations

upvoted a changelog 13 days ago

Hugging Face Changelog

Filter Leaderboards by Model Size

May 20

• 134

upvoted a changelog 25 days ago

Hugging Face Changelog

Filter Models page by Base Models only

25 days ago

• 164

reacted to salma-remyx's post with 🔥 about 1 month ago

Post

11603

The space of possible improvements for your AI model is large while evaluation is costly.

So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback."

The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives.

Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth.

The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis.

Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter.

So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation

arXiv: https://arxiv.org/pdf/2510.17671
Substack: https://remyxai.substack.com/p/lilo-and-myx
VQASynth: https://github.com/remyxai/VQASynth

1 reply

upvoted an article 3 months ago

Article

TRL v1.0: Post-Training Library Built to Move with the Field

qgallouedec, stevhliu, pcuenq, sergiopaniego

•

Mar 31

• 56

reacted to DedeProGames's post with 🔥 3 months ago

Post

3060

🔥 GRM2 - The small one that surpasses the big ones.
What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can.
GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks.
🤗 Model: OrionLLM/GRM2-3b
The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.

🚀 Chat with GRM:
https://huggingface.co/spaces/DedeProGames/GRM2-Chat

🏆 Download official GGUFs: OrionLLM/GRM2-3b-GGUF

upvoted a changelog 3 months ago

Hugging Face Changelog

Storage Buckets for Spaces

Mar 31

• 141

published 2 models 4 months ago

TheDrunkenSnail/Rhodia

Text Generation • 10B • Updated Dec 31, 2024 • 3

TheDrunkenSnail/Rhodia-Q4_K_M-GGUF

10B • Updated Dec 31, 2024 • 1

reacted to darkc0de's post with 🔥 4 months ago

Post

12188

1440GB of VRAM is incredibly satisfying 😁

17 replies

reacted to aiconta's post with 👀 7 months ago

Post

5440

hello, who can help me setup a local LLM and RAG for my job i can pay

11 replies

liked a model 7 months ago

Sao10K/Lmao_life_updates

Updated Oct 23, 2025 • 52

liked a model 11 months ago

openai/gpt-oss-20b

Text Generation • 22B • Updated Aug 26, 2025 • 6.88M • • 4.72k

reacted to AtAndDev's post with 🔥🚀 11 months ago

Post

694

Qwen 3 Coder is a personal attack to k2, and I love it.
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...

Qwen ftw!

DECENTRALIZE DECENTRALIZE DECENTRALIZE

reacted to Wauplin's post with 🔥 11 months ago

Post

3685

Say hello to hf: a faster, friendlier Hugging Face CLI ✨

We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!

So... why this change?

Typing huggingface-cli constantly gets old fast. More importantly, the CLI’s command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.

We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't hf auth login easier to type and remember?

The full rationale, implementation details, and migration notes are in the blog post: https://huggingface.co/blog/hf-cli

7 replies

liked a model 11 months ago

Kwaipilot/KAT-V1-40B

Text Generation • 41B • Updated Aug 23, 2025 • 24 • 118

reacted to AdinaY's post with 👍 11 months ago

Post

2705

KAT-V1 🔥 a LLM that tackles overthinking by switching between reasoning and direct answers, by Kuaishou.

Kwaipilot/KAT-V1-40B

✨ 40B
✨ Step-SRPO: smarter reasoning control via RL
✨ MTP + Distillation: efficient training, lower cost

reacted to blaise-tk's post with 🚀 12 months ago

Post

4527

A few months ago, I shared that I was building with @deeivihh something like "the Steam for open source apps"...

🚀 Today, I’m excited to announce that Dione is now open source and live in public beta!

Our mission is simple: make it easier to discover, use, and contribute to open source applications.

🔗 GitHub: https://github.com/dioneapp/dioneapp
💬 Join the community: https://discord.gg/JDFJp33vrM

Want to give it a try? I’d love your feedback! 👀

reacted to drwlf's post with ❤️🤗 about 1 year ago

Post

5890

Having an insanely good medical LLM is pointless if it won’t answer your questions!

So we’ve made 2 notebook for abliterating any model in order to achieve a good model that will actually help you!

The notebooks are made using @mlabonne ‘s abliteration logic and datasets!

Feel free to use them and happy training 😊

https://github.com/dralexlup/LLM-Abliteration

3 replies

Danny

AI & ML interests

Recent Activity

Organizations

TheDrunkenSnail's activity

Filter Leaderboards by Model Size

Filter Models page by Base Models only

TRL v1.0: Post-Training Library Built to Move with the Field

Storage Buckets for Spaces