-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2502.14768
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 43 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 47 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 441 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 73 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 72
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
Paper • 2402.07754 • Published -
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Paper • 2505.10446 • Published -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93 -
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Paper • 2505.16782 • Published • 1
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 254 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 58 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 126 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 288 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55 -
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper • 2501.09012 • Published • 10 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 28
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 36 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 43 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
Paper • 2402.07754 • Published -
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Paper • 2505.10446 • Published -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 93 -
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Paper • 2505.16782 • Published • 1
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 47 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 254 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 58 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 126 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 441 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 288 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55 -
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper • 2501.09012 • Published • 10 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 28
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 73 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 72
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 36 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47