In a Training Loop 🔄

2 37 519

Yash Marathe

yashmarathe

AI & ML interests

None yet

Recent Activity

liked a model 8 days ago

SakanaAI/kame

liked a dataset 12 days ago

open-thoughts/AgentTrove

liked a Space 17 days ago

jane-street/droppedaneuralnet

View all activity

Organizations

upvoted an article 2 months ago

Article

Introducing Storage Buckets on the Hugging Face Hub

Wauplin, coyotte508, XciD, victor, julien-c, lhoestq, pierric, Sylvestre, hlarcher, rajatarya, seanses, assafvayner

•

Mar 10

• 194

upvoted an article 3 months ago

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez

•

Sep 11, 2025

• 188

upvoted a collection 3 months ago

Open Coding Agents

Collection

13 items • Updated Mar 5 • 53

upvoted an article 6 months ago

Article

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

smohammadi, siro1, winglian, marcsun13, djsaunde

•

Aug 8, 2025

• 98

upvoted a paper 6 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 39

upvoted a paper 7 months ago

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48

upvoted 2 articles 8 months ago

Article

Gaia2 and ARE: Empowering the community to study agents

clefourrier, gregmialz, mlcu, mortimerp9, XciD, tfrere, evijit, RomainFroger, dheeraj7596, CarolinePascal, upiter

•

Sep 22, 2025

• 134

Article

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

exploding-gradients

•

Sep 16, 2025

• 20

upvoted a paper 8 months ago

Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 72

upvoted 3 collections 10 months ago

upvoted 2 collections 11 months ago

Avey 1 Research Preview

Collection

1.5B preview models trained on 100B tokens of FineWeb, and an instruct-tuned version on smoltalk. • 3 items • Updated Jun 16, 2025 • 7

V-JEPA 2

Collection

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 218

upvoted a collection 12 months ago

Falcon-H1

Collection

Falcon-H1 Family of Hybrid-Head Language Models (Transformer-SSM), including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained & instruction-tuned). • 33 items • Updated Mar 2 • 59

upvoted 5 collections about 1 year ago

LipSync and Face Operations

Collection

23 items • Updated 25 days ago • 64

Perception LM

Collection

7 items • Updated Apr 17, 2025 • 64

Perception Encoder

Collection

16 items • Updated Mar 2 • 81

Skywork-OR1

Collection

Skywork Open Reasoner 1 • 8 items • Updated Mar 2 • 32

Kimina Prover Preview

Collection

State-of-the-Art Models for Formal Mathematical Reasoning • 5 items • Updated Apr 28, 2025 • 33

Yash Marathe

AI & ML interests

Recent Activity

Organizations

yashmarathe's activity

Introducing Storage Buckets on the Hugging Face Hub

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Gaia2 and ARE: Empowering the community to study agents

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation