Collections
Discover the best community collections!
Collections including paper arxiv:2505.11594
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper • 2411.10958 • Published • 55 -
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper • 2502.18137 • Published • 58 -
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
Paper • 2505.11594 • Published • 75 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 49
-
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Paper • 2411.10958 • Published • 55 -
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper • 2502.18137 • Published • 58 -
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
Paper • 2505.11594 • Published • 75 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 49
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63