Papers to Read
updated
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper
• 2501.00192
• Published
• 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
• 2501.00958
• Published
• 109
Xmodel-2 Technical Report
Paper
• 2412.19638
• Published
• 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
• 2412.18925
• Published
• 107
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
• 2501.01257
• Published
• 51
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
• 2501.09686
• Published
• 41
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper
• 2501.10120
• Published
• 54
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
• 2501.18492
• Published
• 88
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
• 2501.18511
• Published
• 20
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published
• 62
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published
• 152
Expect the Unexpected: FailSafe Long Context QA for Finance
Paper
• 2502.06329
• Published
• 133
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
Paper
• 2502.07870
• Published
• 45
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
• 2502.07374
• Published
• 40
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
Paper
• 2502.08127
• Published
• 59
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large
Language Models
Paper
• 2502.07346
• Published
• 53
TransMLA: Multi-head Latent Attention Is All You Need
Paper
• 2502.07864
• Published
• 57
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of
Video Foundation Model
Paper
• 2502.10248
• Published
• 57
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
• 2502.12115
• Published
• 46
Magma: A Foundation Model for Multimodal AI Agents
Paper
• 2502.13130
• Published
• 58
Qwen2.5-VL Technical Report
Paper
• 2502.13923
• Published
• 214
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published
• 194
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
• 2502.14786
• Published
• 158
S*: Test Time Scaling for Code Generation
Paper
• 2502.14382
• Published
• 63
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper
• 2502.14739
• Published
• 108
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published
• 61
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper
• 2503.04130
• Published
• 96
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
• 2503.05132
• Published
• 57
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published
• 86
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language
Models via Mixture-of-LoRAs
Paper
• 2503.01743
• Published
• 89
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
• 2503.07536
• Published
• 88
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural
Vision-Language Dataset for Southeast Asia
Paper
• 2503.07920
• Published
• 101
Unified Reward Model for Multimodal Understanding and Generation
Paper
• 2503.05236
• Published
• 123
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Paper
• 2503.11579
• Published
• 21
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
• 2503.10639
• Published
• 53
R1-Onevision: Advancing Generalized Multimodal Reasoning through
Cross-Modal Formalization
Paper
• 2503.10615
• Published
• 17
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
• 2503.10291
• Published
• 36
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based
Scientific Research
Paper
• 2503.13399
• Published
• 22
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Paper
• 2503.11495
• Published
• 14
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Paper
• 2503.13444
• Published
• 17
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Paper
• 2503.14478
• Published
• 48
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
• 2503.12797
• Published
• 32
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal
Consistent Video Generation
Paper
• 2503.06053
• Published
• 138
TULIP: Towards Unified Language-Image Pretraining
Paper
• 2503.15485
• Published
• 49
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
• 2503.16419
• Published
• 77
Video-T1: Test-Time Scaling for Video Generation
Paper
• 2503.18942
• Published
• 90
Video SimpleQA: Towards Factuality Evaluation in Large Video Language
Models
Paper
• 2503.18923
• Published
• 14
Reasoning to Learn from Latent Thoughts
Paper
• 2503.18866
• Published
• 13
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Paper
• 2503.19990
• Published
• 35
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published
• 170
Scaling Vision Pre-Training to 4K Resolution
Paper
• 2503.19903
• Published
• 41
CoLLM: A Large Language Model for Composed Image Retrieval
Paper
• 2503.19910
• Published
• 15
Exploring Hallucination of Large Multimodal Models in Video
Understanding: Benchmark, Analysis and Mitigation
Paper
• 2503.19622
• Published
• 31
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper
• 2503.19325
• Published
• 73
MDocAgent: A Multi-Modal Multi-Agent Framework for Document
Understanding
Paper
• 2503.13964
• Published
• 20
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
• 2503.19855
• Published
• 29
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper
• 2503.18931
• Published
• 30
Defeating Prompt Injections by Design
Paper
• 2503.18813
• Published
• 24
Wan: Open and Advanced Large-Scale Video Generative Models
Paper
• 2503.20314
• Published
• 59
Gemini Robotics: Bringing AI into the Physical World
Paper
• 2503.20020
• Published
• 31
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper
• 2503.21776
• Published
• 79
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
• 2503.21460
• Published
• 83
ResearchBench: Benchmarking LLMs in Scientific Discovery via
Inspiration-Based Task Decomposition
Paper
• 2503.21248
• Published
• 21
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for
Embodied Interactive Tasks
Paper
• 2503.21696
• Published
• 23
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published
• 43
Your ViT is Secretly an Image Segmentation Model
Paper
• 2503.19108
• Published
• 25
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
• 2503.24235
• Published
• 54
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
Policy
Paper
• 2503.24388
• Published
• 29
Any2Caption:Interpreting Any Condition to Caption for Controllable Video
Generation
Paper
• 2503.24379
• Published
• 76
JudgeLRM: Large Reasoning Models as a Judge
Paper
• 2504.00050
• Published
• 62
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
• 2503.24376
• Published
• 38
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
LLMs on Academic Resources
Paper
• 2504.00595
• Published
• 37
Z1: Efficient Test-time Scaling with Code
Paper
• 2504.00810
• Published
• 26
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for
Large Language Models
Paper
• 2503.24377
• Published
• 18
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
• 2504.00883
• Published
• 67
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
• 2503.20783
• Published
• 59
PaperBench: Evaluating AI's Ability to Replicate AI Research
Paper
• 2504.01848
• Published
• 37
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published
• 303
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image
Generation
Paper
• 2504.02782
• Published
• 57
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
• 2504.02587
• Published
• 32
MedSAM2: Segment Anything in 3D Medical Images and Videos
Paper
• 2504.03600
• Published
• 10
SmolVLM: Redefining small and efficient multimodal models
Paper
• 2504.05299
• Published
• 205
One-Minute Video Generation with Test-Time Training
Paper
• 2504.05298
• Published
• 110
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published
• 80
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
• 2504.03151
• Published
• 15
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published
• 85
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published
• 77
OmniCaptioner: One Captioner to Rule Them All
Paper
• 2504.07089
• Published
• 20
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
• 2504.06958
• Published
• 13
Are We Done with Object-Centric Learning?
Paper
• 2504.07092
• Published
• 6
Paper
• 2504.07491
• Published
• 137
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
VCR-Bench: A Comprehensive Evaluation Framework for Video
Chain-of-Thought Reasoning
Paper
• 2504.07956
• Published
• 46
MM-IFEngine: Towards Multimodal Instruction Following
Paper
• 2504.07957
• Published
• 35
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper
• 2504.08685
• Published
• 130
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
• 2504.08736
• Published
• 46
FUSION: Fully Integration of Vision-Language Representations for Deep
Cross-Modal Understanding
Paper
• 2504.09925
• Published
• 39
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published
• 306
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
• 2504.08837
• Published
• 43
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Paper
• 2504.09641
• Published
• 16
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper
• 2504.10465
• Published
• 27
Efficient Reasoning Models: A Survey
Paper
• 2504.10903
• Published
• 21
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
• 2504.13161
• Published
• 93
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference
Optimization for Large Video Models
Paper
• 2504.13122
• Published
• 20
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published
• 63
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published
• 49
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
• 2504.15279
• Published
• 78
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Describe Anything: Detailed Localized Image and Video Captioning
Paper
• 2504.16072
• Published
• 64
Eagle 2.5: Boosting Long-Context Post-Training for Frontier
Vision-Language Models
Paper
• 2504.15271
• Published
• 67
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published
• 123
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning
in Multimodal LLMs
Paper
• 2504.15415
• Published
• 23
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Paper
• 2504.15521
• Published
• 64
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
• 2504.20571
• Published
• 98
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement
Learning
Paper
• 2505.02835
• Published
• 28
RM-R1: Reward Modeling as Reasoning
Paper
• 2505.02387
• Published
• 81
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level
and Token-level CoT
Paper
• 2505.00703
• Published
• 44
100 Days After DeepSeek-R1: A Survey on Replication Studies and More
Directions for Reasoning Language Models
Paper
• 2505.00551
• Published
• 36
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language
Models in Math
Paper
• 2504.21233
• Published
• 49
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement
Fine-Tuning
Paper
• 2505.03318
• Published
• 92
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
• 2505.04588
• Published
• 65
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following
in LLMs
Paper
• 2505.11423
• Published
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop
System from Hypothesis to Verification
Paper
• 2505.16938
• Published
• 121
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
• 2505.15966
• Published
• 53
Think or Not? Selective Reasoning via Reinforcement Learning for
Vision-Language Models
Paper
• 2505.16854
• Published
• 11
GRIT: Teaching MLLMs to Think with Images
Paper
• 2505.15879
• Published
• 13
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Paper
• 2503.20752
• Published
• 1
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published
• 30
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with
Reinforcement Learning
Paper
• 2505.14677
• Published
• 15
Emerging Properties in Unified Multimodal Pretraining
Paper
• 2505.14683
• Published
• 133
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement
Learning
Paper
• 2505.14231
• Published
• 53
Visual Agentic Reinforcement Fine-Tuning
Paper
• 2505.14246
• Published
• 32
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via
Reinforcement Learning to Rank
Paper
• 2505.14460
• Published
• 33
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper
• 2505.11049
• Published
• 61
Visual Planning: Let's Think Only with Images
Paper
• 2505.11409
• Published
• 57
OpenThinkIMG: Learning to Think with Images via Visual Tool
Reinforcement Learning
Paper
• 2505.08617
• Published
• 42
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
• 2505.10554
• Published
• 120
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
• 2505.09568
• Published
• 99
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Paper
• 2505.04410
• Published
• 44
Bring Reason to Vision: Understanding Perception and Reasoning through
Model Merging
Paper
• 2505.05464
• Published
• 11
Perception, Reason, Think, and Plan: A Survey on Large Multimodal
Reasoning Models
Paper
• 2505.04921
• Published
• 186
Fin-R1: A Large Language Model for Financial Reasoning through
Reinforcement Learning
Paper
• 2503.16252
• Published
• 30
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial
Intelligence
Paper
• 2505.23747
• Published
• 69
Table-R1: Inference-Time Scaling for Table Reasoning
Paper
• 2505.23621
• Published
• 93
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published
• 131
FAMA: The First Large-Scale Open-Science Speech Foundation Model for
English and Italian
Paper
• 2505.22759
• Published
• 19
D-AR: Diffusion via Autoregressive Models
Paper
• 2505.23660
• Published
• 34
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV
Cache and Parallel Decoding
Paper
• 2505.22618
• Published
• 45
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Paper
• 2505.22651
• Published
• 48
Skywork Open Reasoner 1 Technical Report
Paper
• 2505.22312
• Published
• 54
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper
• 2505.22453
• Published
• 46
Advancing Multimodal Reasoning via Reinforcement Learning with Cold
Start
Paper
• 2505.22334
• Published
• 36
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
Paper
• 2505.21327
• Published
• 83
Paper2Poster: Towards Multimodal Poster Automation from Scientific
Papers
Paper
• 2505.21497
• Published
• 109
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic
Scientific Workflows
Paper
• 2505.19897
• Published
• 104
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks
Paper
• 2505.16459
• Published
• 45
BizFinBench: A Business-Driven Real-World Financial Benchmark for
Evaluating LLMs
Paper
• 2505.19457
• Published
• 64
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published
• 88
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
• 2505.18129
• Published
• 62
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
• 2505.16410
• Published
• 58
Paper
• 2506.03569
• Published
• 80
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged
Reinforcement Learning
Paper
• 2506.04207
• Published
• 48
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published
• 277
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in
Multi-Agent Environments
Paper
• 2506.02387
• Published
• 58
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Paper
• 2505.24714
• Published
• 37
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications
of Agentic AI
Paper
• 2505.19443
• Published
• 15
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language
Models for Robotics
Paper
• 2506.04308
• Published
• 43
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic
Sampling
Paper
• 2506.08672
• Published
• 30
Geopolitical biases in LLMs: what are the "good" and the "bad" countries
according to contemporary language models
Paper
• 2506.06751
• Published
• 71
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical
Reasoning
Paper
• 2506.09513
• Published
• 102
Scientists' First Exam: Probing Cognitive Abilities of MLLM via
Perception, Understanding, and Reasoning
Paper
• 2506.10521
• Published
• 73
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark
for Financial LLM Evaluation
Paper
• 2506.14028
• Published
• 93
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction
and Planning
Paper
• 2506.09985
• Published
• 31
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
• 2506.16406
• Published
• 131
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
• 2506.06395
• Published
• 133
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math
Reasoning
Paper
• 2506.09736
• Published
• 9
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
Paper
• 2506.10960
• Published
• 12
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published
• 79
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning
Dataset
Paper
• 2507.03483
• Published
• 24
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
• 2506.23918
• Published
• 90
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published
• 159
Scaling RL to Long Videos
Paper
• 2507.07966
• Published
• 160
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation
from Diffusion Models
Paper
• 2507.07104
• Published
• 46
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
• 2507.05255
• Published
• 75
Vision Foundation Models as Effective Visual Tokenizers for
Autoregressive Image Generation
Paper
• 2507.08441
• Published
• 62
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
VisionThink: Smart and Efficient Vision Language Model via Reinforcement
Learning
Paper
• 2507.13348
• Published
• 79
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent
Planning
Paper
• 2507.16815
• Published
• 42
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via
Context-Aware Multi-Stage Policy Optimization
Paper
• 2507.14683
• Published
• 134
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper
• 2507.16746
• Published
• 34
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published
• 122
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Paper
• 2507.22607
• Published
• 47
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
• 2507.21046
• Published
• 84
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published
• 158
SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment
Paper
• 2507.20984
• Published
• 58
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 317
Captain Cinema: Towards Short Movie Generation
Paper
• 2507.18634
• Published
• 42
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science
Reasoning
Paper
• 2507.16812
• Published
• 63
Intern-S1: A Scientific Multimodal Foundation Model
Paper
• 2508.15763
• Published
• 269
A Survey on Large Language Model Benchmarks
Paper
• 2508.15361
• Published
• 20
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm
Bridging Foundation Models and Lifelong Agentic Systems
Paper
• 2508.07407
• Published
• 98
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206