-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 95 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
Collections
Discover the best community collections!
Collections including paper arxiv:2401.17268
-
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Paper • 2310.08185 • Published • 8 -
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
Paper • 2310.05388 • Published • 4 -
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
Paper • 2311.09180 • Published • 8 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
Alignment for Honesty
Paper • 2312.07000 • Published • 15 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 14
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 36 -
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Paper • 2308.10848 • Published • 1 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 10
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 55 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 42 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 95 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 27 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21
-
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
Alignment for Honesty
Paper • 2312.07000 • Published • 15 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 14
-
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Paper • 2310.08185 • Published • 8 -
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
Paper • 2310.05388 • Published • 4 -
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
Paper • 2311.09180 • Published • 8 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 36 -
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Paper • 2308.10848 • Published • 1 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 10
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 55 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 42 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31