-
UNDO: Understanding Distillation as Optimization
Paper • 2504.02521 • Published -
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings
Paper • 2503.03008 • Published • 1 -
Understanding Self-Distillation in the Presence of Label Noise
Paper • 2301.13304 • Published -
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Paper • 2407.03475 • Published
Keira Chen
KeiraYC
AI & ML interests
None yet
Recent Activity
updated
a collection
about 1 month ago
Self-distillation
updated
a collection
about 1 month ago
Self-distillation
updated
a collection
about 1 month ago
Self-distillation
Organizations
None yet
Low-rank attention
-
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper • 2502.18137 • Published • 59 -
XAttention: Block Sparse Attention with Antidiagonal Scoring
Paper • 2503.16428 • Published • 15 -
On the Benefits of Rank in Attention Layers
Paper • 2407.16153 • Published -
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Paper • 2504.20938 • Published
TDL project
adversarial attack
-
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
Paper • 2304.09875 • Published -
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Paper • 2504.02733 • Published -
RADAR: Benchmarking Language Models on Imperfect Tabular Data
Paper • 2506.08249 • Published • 2 -
Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
Paper • 2306.04064 • Published
Self-distillation
-
UNDO: Understanding Distillation as Optimization
Paper • 2504.02521 • Published -
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings
Paper • 2503.03008 • Published • 1 -
Understanding Self-Distillation in the Presence of Label Noise
Paper • 2301.13304 • Published -
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Paper • 2407.03475 • Published
TDL project
adversarial attack
-
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
Paper • 2304.09875 • Published -
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Paper • 2504.02733 • Published -
RADAR: Benchmarking Language Models on Imperfect Tabular Data
Paper • 2506.08249 • Published • 2 -
Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
Paper • 2306.04064 • Published
Low-rank attention
-
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper • 2502.18137 • Published • 59 -
XAttention: Block Sparse Attention with Antidiagonal Scoring
Paper • 2503.16428 • Published • 15 -
On the Benefits of Rank in Attention Layers
Paper • 2407.16153 • Published -
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Paper • 2504.20938 • Published