view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 378
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 smohammadi, siro1, winglian, marcsun13, djsaunde • Aug 8, 2025 • 98
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation Paper • 2506.04462 • Published Jun 4, 2025 • 2
view article Article Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick cxdu • Oct 24, 2024 • 14
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs) Paper • 2407.14937 • Published Jul 20, 2024 • 1