Salma Mayorquin PRO

salma-remyx

https://remyx.ai

smellslikeml

AI & ML interests

None yet

Recent Activity

liked a model about 4 hours ago

remyxai/mhpd-dpo-qwen3.5-2b-vqasynth

posted an update about 5 hours ago

Just trained a 2B coding model to rank candidate AI/ML research ideas against the implicit preferences in a code repository's merge history. The training data comes from a Gaussian Process fit on the accumulated dispositions in VQASynth, where each PR against a deployed project yields a pairwise comparison between the feature branch preferred and the baseline at main. The GP scores candidate papers to synthesize preference pairs, and DPO with LoRA bakes the ranking pipeline into the model's weights. After 1 epoch the model reaches 87.4% reward accuracy on the held-out eval split against 92.3% on training, consistent with learning the task without overfitting. Now, I'm scaling the pipeline to thousands of repos for a generalization test. Dataset: https://huggingface.co/datasets/remyxai/mhpd-dpo-v0 Model: https://huggingface.co/remyxai/mhpd-dpo-qwen3.5-2b-vqasynth Substack: https://remyxai.substack.com/p/the-ai-pm

updated a model about 7 hours ago

remyxai/mhpd-dpo-qwen3.5-2b-vqasynth

View all activity

Organizations

Posts 30

Post

Just trained a 2B coding model to rank candidate AI/ML research ideas against the implicit preferences in a code repository's merge history.

The training data comes from a Gaussian Process fit on the accumulated dispositions in VQASynth, where each PR against a deployed project yields a pairwise comparison between the feature branch preferred and the baseline at main.

The GP scores candidate papers to synthesize preference pairs, and DPO with LoRA bakes the ranking pipeline into the model's weights.

After 1 epoch the model reaches 87.4% reward accuracy on the held-out eval split against 92.3% on training, consistent with learning the task without overfitting.

Now, I'm scaling the pipeline to thousands of repos for a generalization test.

Dataset: remyxai/mhpd-dpo-v0
Model: remyxai/mhpd-dpo-qwen3.5-2b-vqasynth
Substack: https://remyxai.substack.com/p/the-ai-pm

Post

11449

The space of possible improvements for your AI model is large while evaluation is costly.

So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback."

The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives.

Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth.

The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis.

Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter.

So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation

arXiv: https://arxiv.org/pdf/2510.17671
Substack: https://remyxai.substack.com/p/lilo-and-myx
VQASynth: https://github.com/remyxai/VQASynth