🔄 In a Training Loop

Joel Wang

joelhenwang

3 255 57

joelhenwang

AI & ML interests

None yet

Recent Activity

upvoted a paper about 21 hours ago

PACE: A Proxy for Agentic Capability Evaluation

upvoted a paper about 21 hours ago

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

upvoted a paper about 21 hours ago

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

View all activity

Organizations

upvoted 5 papers about 21 hours ago

PACE: A Proxy for Agentic Capability Evaluation

Paper • 2607.02032 • Published 3 days ago • 6

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

Paper • 2606.27669 • Published 9 days ago • 8

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

Paper • 2607.01874 • Published 3 days ago • 14

Morphing into Hybrid Attention Models

Paper • 2606.30562 • Published 6 days ago • 34

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Paper • 2607.02440 • Published 3 days ago • 41

upvoted 11 papers 2 days ago

Prism Transformer: Progressive Head Schedules for Hierarchical Attention Processing

Paper • 2606.27449 • Published 10 days ago • 1

MOPD: Multi-Teacher On-Policy Distillation for Capability Integration in LLM Post-Training

Paper • 2606.30406 • Published 6 days ago • 12

Multi-Block Diffusion Language Models

Paper • 2606.29215 • Published 5 days ago • 30

AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

Paper • 2606.31551 • Published 5 days ago • 14

The State-Prediction Separation Hypothesis

Paper • 2607.01218 • Published 4 days ago • 8

CausalMix: Data Mixture as Causal Inference for Language Model Training

Paper • 2607.01104 • Published 4 days ago • 17

liked a dataset 3 days ago

mlabonne/natural_reasoning-formatted

Viewer • Updated Feb 21, 2025 • 1.15M • 140 • 17

liked a Space 3 days ago

RL-for-LLMs Wiki

📖

A beautiful reader for the RL-for-LLMs knowledge base

liked a dataset 3 days ago

mlabonne/open-perfectblend

Preview • Updated Jan 15, 2025 • 3.41k • 145

upvoted a paper 3 days ago

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Paper • 2606.30634 • Published 6 days ago • 23

Joel Wang

AI & ML interests

Recent Activity

Organizations

joelhenwang's activity

RL-for-LLMs Wiki