maximousblk 's Collections
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper
• 2312.12456
• Published
• 45
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
• 2312.12742
• Published
• 13
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper
• 2312.12682
• Published
• 9
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
• 2312.11514
• Published
• 260
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
• 2312.07987
• Published
• 41
Distributed Inference and Fine-tuning of Large Language Models Over The
Internet
Paper
• 2312.08361
• Published
• 27
COLMAP-Free 3D Gaussian Splatting
Paper
• 2312.07504
• Published
• 13
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh
Reconstruction and High-Quality Mesh Rendering
Paper
• 2311.12775
• Published
• 29
Exponentially Faster Language Modelling
Paper
• 2311.10770
• Published
• 119
Orca 2: Teaching Small Language Models How to Reason
Paper
• 2311.11045
• Published
• 77
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper
• 2308.14352
• Published
Scaling up GANs for Text-to-Image Synthesis
Paper
• 2303.05511
• Published
• 3
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
• 2312.16862
• Published
• 31
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and
Erasing Applications
Paper
• 2312.16145
• Published
• 10
Paper
• 2310.06825
• Published
• 58
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale
Pretraining Corpus for Math
Paper
• 2312.17120
• Published
• 28
Paper
• 2312.17244
• Published
• 9
Unsupervised Universal Image Segmentation
Paper
• 2312.17243
• Published
• 20
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published
• 27
Paper
• 2401.04088
• Published
• 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper
• 2309.11235
• Published
• 15
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
BlackMamba: Mixture of Experts for State-Space Models
Paper
• 2402.01771
• Published
• 25
Paper
• 2402.13144
• Published
• 100
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
Gemma: Open Models Based on Gemini Research and Technology
Paper
• 2403.08295
• Published
• 50
GaussianImage: 1000 FPS Image Representation and Compression by 2D
Gaussian Splatting
Paper
• 2403.08551
• Published
• 11
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published
• 112
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published
• 107
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
• 2404.07973
• Published
• 32
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published
• 111
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published
• 56
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Paper
• 2406.06282
• Published
• 39
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated
Parameters
Paper
• 2406.05955
• Published
• 27
OmniGen: Unified Image Generation
Paper
• 2409.11340
• Published
• 115
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published
• 170
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
Paper
• 2601.14251
• Published
• 24