PaTH Attention: Position Encoding via Accumulating Householder Transformations Paper • 2505.16381 • Published May 22, 2025
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Paper • 2507.06607 • Published Jul 9, 2025 • 10
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30, 2025 • 49
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29, 2025 • 98
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3, 2025 • 89
Evaluating Hallucinations in Chinese Large Language Models Paper • 2310.03368 • Published Oct 5, 2023
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published Sep 18, 2024 • 45
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Paper • 2406.06484 • Published Jun 10, 2024 • 4
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published Sep 11, 2024 • 20
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published Jun 11, 2024 • 40
Sparse Modular Activation for Efficient Sequence Modeling Paper • 2306.11197 • Published Jun 19, 2023 • 1