Sultan fatih's picture

Sultan fatih

Sultanfatih

·

AI & ML interests

None yet

Recent Activity

reacted to Kseniase's post with 👍 about 1 month ago

6 Comprehensive Resources on AI Coding AI coding is moving fast, and it’s getting harder to tell what actually works. Agents, workflows, context management and many other aspects are reshaping how software gets built. We’ve collected a set of resources to help you understand how AI coding is evolving today and what building strategies work best: 1. https://huggingface.co/papers/2508.11126 Provides a clear taxonomy, compares agent architectures, and exposes practical gaps in tools, benchmarks, and reliability that AI coding agents now struggle with 2. https://huggingface.co/papers/2511.04427 This survey from Carnegie Mellon University shows causal evidence that LLM agent assistants deliver short-term productivity gains but have lasting quality costs that can slow development over time 3. https://huggingface.co/papers/2510.12399 Turns Vibe Coding from hype into a structured field, categorizing real development workflows. It shows which models, infrastructure, tool requirements, context, and collaboration setups affect real software development outcomes 4. https://huggingface.co/papers/2511.18538 (from Chinese institutes and companies like ByteDance and Alibaba) Compares real code LLMs, shows how training and alignment choices affect code quality and security, and connects academic benchmarks to everyday software development 5. Build Your Own Coding Agent via a Step-by-Step Workshop⟶ https://github.com/ghuntley/how-to-build-a-coding-agent A great guide that covers the basics of building an AI-powered coding assistant – from a chatbot to a file reader/explorer/editor and code search 6. State of AI Coding: Context, Trust, and Subagents⟶ https://www.turingpost.com/p/aisoftwarestack Here is our in-depth analysis of where AI coding is heading and the new directions we see today – like agent swarms and context management importance – offering an emerging playbook beyond the IDE If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

reacted to martinsu's post with 👀 about 1 month ago

https://huggingface.co/blog/martinsu/potus-broke-my-pipeline How POTUS Completely Broke My Flash 2.5-Based Guardrail Did quite a bit of deep research on this one, since it IMHO matters. At first I used this story to amuse fellow MLOps guys, but then I went deeper and was surprised. To those who don't want to read too much, in plain English: when you give the model a high-stakes statement that clashes with what it "knows" about the world, it gets more brittle. Sometimes to a point of being unusable. Or an even shorter version: do not clash with the model's given worldview—it will degrade to some extent. And in practice, it means that in lower-resource languages like Latvian and Finnish (and probably others), Flash 2.5 is an unreliable guardrail model when something clashes with the model's general "worldview". However, I'm sure this degradation applies to other languages and models as well to varying extents. In one totally normal week of MLOps, my news summarization pipeline started failing intermittently. Nothing was changed. No deploys. No prompt edits. No model version bump (as far as I could tell). Yet the guardrail would suddenly turn into a grumpy judge and reject outputs for reasons that felt random, sometimes even contradicting itself between runs. It was the worst kind of failure: silent, flaky, and impossible to reproduce on demand. Then I noticed the pattern: it started when one specific named entity appeared in the text — Donald Trump ** (**and later in tests — Bernie Sanders too ). And then down the rabbit hole I went.

View all activity

Organizations

None yet

Sultanfatih 's models

None public yet