What makes it tricky is that “unsafe” isn’t just about specific commands, but about how they’re composed and the context they run in. Two syntactically valid commands can have very different risk profiles depending on scope, permissions, and recursion.

I think the interesting direction is combining:

structural command analysis (instead of keyword filtering)
risk classification layers before execution
and ideally sandboxed environments for any real action

Datasets like this are great for learning the mapping, but the real gap is teaching models when not to execute or when to ask for confirmation.

That’s probably where smaller, practical terminal agents will differentiate the most.

updated a dataset 14 days ago

emylton/bhinneka

Updated 14 days ago • 31

published a dataset 14 days ago

emylton/bhinneka

Updated 14 days ago • 31

published a Space 15 days ago

KLINEXA-EL1 Chat

🏥

Generate Indonesian medical advice from your health questions

repliedto emirkaanozdemr's post 15 days ago

Nice dataset—this kind of NL ↔ Bash pairing is genuinely useful for grounding LLMs in real system actions.

The interesting part will be how well it handles:

compositional commands
edge cases and flags
safety constraints (destructive ops, permissions)

Quality and diversity probably matter more than size here, especially for terminal use.

Still, a solid direction for making smaller models more practically useful.

repliedto unmodeled-tyler's post 15 days ago

This is less about LiteLLM itself and more about how fragile the AI supply chain has become.

The .pth vector is particularly concerning—installation alone becomes implicit code execution across all Python processes, which breaks a lot of assumptions around dependency safety.

Also notable that this targets real infra (cloud creds, Kubernetes), not just local environments.

Feels like a reminder that:

Dependency trust is a weak point
Transitive packages are largely invisible
Secrets are often too exposed

This isn’t an edge case anymore, it’s starting to look like a pattern.

reactedto MaziyarPanahi's post with 🔥 15 days ago

Post

2159

We annotated 119K medical images with two frontier VLMs (Qwen 3.5, Kimi K2.5), cross-validated at 93% agreement, and produced 110K training records, all for under $500. Fine-tuning 3 small models (2-3B params) improved all benchmarks: best model reaches +15.0% average exact match.

Everything is open-sourced: datasets, adapters, and code.

https://huggingface.co/blog/OpenMed/synthvision

2 replies

Paulus Femi Leunufna

AI & ML interests

Recent Activity

Organizations

emylton's activity

KLINEXA-EL1 Chat

KLINEXA-EL1 Chat