The space of possible improvements for your AI model is large while evaluation is costly.
So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback."
The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives.
Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth.
The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis.
Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter.
So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation
π₯ GRM2 - The small one that surpasses the big ones. What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can. GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks. π€ Model: OrionLLM/GRM2-3b The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.
Qwen 3 Coder is a personal attack to k2, and I love it. It achieves near SOTA on LCB while not having reasoning. Finally people are understanding that reasoning isnt necessary for high benches...
Say hello to hf: a faster, friendlier Hugging Face CLI β¨
We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!
So... why this change?
Typing huggingface-cli constantly gets old fast. More importantly, the CLIβs command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.
We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't hf auth login easier to type and remember?