Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16929

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22, 2025 • 21
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22, 2025 • 22
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

microsoft/bitnet-b1.58-2B-4T

Text Generation • Updated Dec 17, 2025 • 15.9k • 1.31k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14, 2025 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17, 2025 • 24 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15, 2025 • 63

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1, 2025 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31, 2025 • 1

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20, 2025 • 29
Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22, 2025 • 44
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12, 2025 • 47
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 126

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

Theory and Representation learning

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7, 2025 • 47
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published Nov 6, 2025 • 17

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6, 2025 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

Theory, Conceptualization, Paradigms

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12, 2025 • 47
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30
Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17, 2025 • 121
Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published Dec 31, 2025 • 44

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities

Paper • 2504.16078 • Published Apr 22, 2025 • 21
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22, 2025 • 22
OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

Theory and Representation learning

I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7, 2025 • 47
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published Nov 6, 2025 • 17

microsoft/bitnet-b1.58-2B-4T

Text Generation • Updated Dec 17, 2025 • 15.9k • 1.31k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14, 2025 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17, 2025 • 24 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15, 2025 • 63

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1, 2025 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31, 2025 • 1

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6, 2025 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20, 2025 • 29
Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22, 2025 • 44
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12, 2025 • 47
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 126

Theory, Conceptualization, Paradigms

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12, 2025 • 47
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30
Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17, 2025 • 121
Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published Dec 31, 2025 • 44

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs