-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
Collections
Discover the best community collections!
Collections including paper arxiv:2401.17268
-
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper • 2408.10945 • Published • 11 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53
-
Just How Flexible are Neural Networks in Practice?
Paper • 2406.11463 • Published • 7 -
Not All Language Model Features Are Linear
Paper • 2405.14860 • Published • 41 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 115 -
An Interactive Agent Foundation Model
Paper • 2402.05929 • Published • 30
-
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 30 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45 -
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
Paper • 2402.01118 • Published • 32 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
-
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper • 2408.10945 • Published • 11 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53
-
Just How Flexible are Neural Networks in Practice?
Paper • 2406.11463 • Published • 7 -
Not All Language Model Features Are Linear
Paper • 2405.14860 • Published • 41 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 115 -
An Interactive Agent Foundation Model
Paper • 2402.05929 • Published • 30
-
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 30 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45 -
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
Paper • 2402.01118 • Published • 32 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67