Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Giyeong Oh's picture
7 9 20

Giyeong Oh

BootsofLagrangian
Steamout's profile picture yomir's profile picture jeochris's profile picture
·
  • BootsofLagrangian

AI & ML interests

Deep Learning, Personalization

Recent Activity

reacted to KingNish's post with 🔥 about 8 hours ago
Muon vs MuonClip vs Muon+Adamw Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out. Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW. Takeaway: for small-scale fine-tuning, hybrid = practical and reliable. Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out. Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
updated a model about 13 hours ago
activeDap/Qwen-1_8B_ultrafeedback_chosen
published a model about 13 hours ago
activeDap/Qwen-1_8B_ultrafeedback_chosen
View all activity

Organizations

SD-Umamusume's profile picture Lycoris-Amaryllis's profile picture mirlab's profile picture Language AL's profile picture

BootsofLagrangian 's datasets

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs