Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.13676

Large Language Model Unlearning via Embedding-Corrupted Prompts

Paper • 2406.07933 • Published Jun 12, 2024 • 9
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17, 2024 • 19
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 47
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 148

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 32
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31, 2024 • 17
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22, 2024 • 26
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 32

new architecture

about 16 hours ago

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 60
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

small-but-mighty-llms

microsoft/Phi-3-mini-128k-instruct

Text Generation • 4B • Updated 26 days ago • 55.3k • 1.69k
Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20, 2024 • 46
GLiClass: Generalist Lightweight Model for Sequence Classification Tasks

Paper • 2508.07662 • Published Aug 11, 2025 • 9

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
Orion-14B: Open-source Multilingual Large Language Models

Paper • 2401.12246 • Published Jan 20, 2024 • 14
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 60
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 48

Large Language Model Unlearning via Embedding-Corrupted Prompts

Paper • 2406.07933 • Published Jun 12, 2024 • 9
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17, 2024 • 19
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

small-but-mighty-llms

microsoft/Phi-3-mini-128k-instruct

Text Generation • 4B • Updated 26 days ago • 55.3k • 1.69k
Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20, 2024 • 46
GLiClass: Generalist Lightweight Model for Sequence Classification Tasks

Paper • 2508.07662 • Published Aug 11, 2025 • 9

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 47
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 148

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 32
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31, 2024 • 17
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22, 2024 • 26
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 32

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
Orion-14B: Open-source Multilingual Large Language Models

Paper • 2401.12246 • Published Jan 20, 2024 • 14
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 60
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 48

new architecture

about 16 hours ago

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 60
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs