Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper ⢠2603.12254 ⢠Published 13 days ago ⢠14
SANA-Video Collection š¬ SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer ⢠10 items ⢠Updated 9 days ago ⢠7
3D Aware Region Prompted Vision Language Model Paper ⢠2509.13317 ⢠Published Sep 16, 2025 ⢠14
ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory Paper ⢠2509.04439 ⢠Published Sep 4, 2025 ⢠1
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems Paper ⢠2510.12872 ⢠Published Oct 14, 2025 ⢠4
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper ⢠2510.15870 ⢠Published Oct 17, 2025 ⢠92
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail Paper ⢠2511.00088 ⢠Published Oct 30, 2025 ⢠4
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference Paper ⢠2510.17777 ⢠Published Oct 20, 2025 ⢠1
NVILA Collection NVILA: Efficient Frontier Visual Language Models ⢠12 items ⢠Updated 15 days ago ⢠17
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model Paper ⢠2310.15110 ⢠Published Oct 23, 2023 ⢠3
Condition-Aware Neural Network for Controlled Image Generation Paper ⢠2404.01143 ⢠Published Apr 1, 2024 ⢠13
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper ⢠2409.04429 ⢠Published Sep 6, 2024