113 182

Kalyan KS PRO

kalyan-ks

AI & ML interests

NLP (LLMs)

Recent Activity

liked a model about 2 hours ago

LiquidAI/LFM2.5-8B-A1B

upvoted a changelog about 4 hours ago

Filter Models page by Base Models only

liked a model about 7 hours ago

knowledgator/opir-edge-v1.0

View all activity

Organizations

liked a model about 2 hours ago

LiquidAI/LFM2.5-8B-A1B

Text Generation • 8B • Updated about 4 hours ago • 56

upvoted a changelog about 4 hours ago

Hugging Face Changelog

Filter Models page by Base Models only

about 5 hours ago

• 29

liked 2 models about 7 hours ago

knowledgator/opir-edge-v1.0

Text Classification • 32.7M • Updated 1 day ago • 19 • 3

knowledgator/opir-multitask-large-v1.0

Text Classification • 0.4B • Updated 1 day ago • 15 • 4

upvoted a collection about 7 hours ago

Opir

Collection

Efficient Multi-Task Safety Classification Models • 4 items • Updated 1 day ago • 4

liked a model 1 day ago

NousResearch/Hermes-4.3-36B

Text Generation • 36B • Updated Dec 6, 2025 • 6.81k • 216

upvoted a collection 2 days ago

LEG

Collection

A Lightweight Explainable Guardrail for LLM Safety • 12 items • Updated Apr 18 • 1

posted an update 2 days ago

Post

1563

LLM Guardrail Models are Less Robust Against Text Mutation Attacks

Blog post - https://huggingface.co/blog/kalyan-ks/llm-guardrail-models-less-robust

Evaluated the robustness of three LLM guardrail models (GLiGuard, LlamaGuard3 and MiniGuard).

Evaluation is done using 16 text mutation attacks over three datasets (AEGIS 2.0, WildGuard and ExpGuard).

Achieved average Unsafe ASR score of up to 33% and average Safe ASR score of up to 25% against GLiGuard model.

Achieved average Unsafe ASR score of up to 35% and average Safe ASR score of up to 17% against LlamaGuard3-8B model.

Achieved average Unsafe ASR score of up to 45% and average Safe ASR score of up to 15% against MiniGuard v0.1 model.