30 11

Kale

Zyn123

AI & ML interests

None yet

Recent Activity

upvoted an article 16 days ago

Efficient LLM Pretraining: Packed Sequences and Masked Attention

upvoted a paper 6 months ago

Less is More: Recursive Reasoning with Tiny Networks

upvoted a paper 7 months ago

Set Block Decoding is a Language Model Inference Accelerator

View all activity

Organizations

None yet

upvoted an article 16 days ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

upvoted a paper 6 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 512

upvoted a paper 7 months ago

Set Block Decoding is a Language Model Inference Accelerator

Paper • 2509.04185 • Published Sep 4, 2025 • 54

upvoted 5 articles about 1 year ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Mar 17, 2025

•

355

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28, 2025

•

888

Article

Open-R1: Update #1

Feb 2, 2025

•

305

Article

Mastering Tensor Dimensions in Transformers

Jan 12, 2025

•

150

Article

Deriving DPO's Loss

Dec 24, 2024

•

liked a model about 1 year ago

Tiiny/SmallThinker-3B-Preview

Text Generation • 3B • Updated Jan 16, 2025 • 684 • 416

liked a model over 1 year ago

onnx-community/moonshine-base-ONNX

Automatic Speech Recognition • Updated Jan 18, 2025 • 6.56k • 33

upvoted 2 articles over 1 year ago

Article

Decoding Strategies in Large Language Models

Oct 29, 2024

•

110

Article

Fine-tune Llama 2 with DPO

Aug 8, 2023

•

liked a model over 1 year ago

EmergentMethods/gliner_medium_news-v2.1

Token Classification • 0.2B • Updated Jan 12 • 218 • 83

upvoted 6 articles over 1 year ago

Article

How to build a custom text classifier without days of human labeling

Oct 17, 2024

•

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

•

278

Article

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Nov 3, 2022

•

366

Article

Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging

Aug 19, 2024

•

Article

Merge Large Language Models with mergekit

Jan 9, 2024

•

153

Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Jul 18, 2024

•

liked a model over 1 year ago

csdc-atl/dialogue-rewriter

Updated Oct 16, 2023 • 3 • 16

Kale

AI & ML interests

Recent Activity

Organizations

Zyn123's activity

Efficient LLM Pretraining: Packed Sequences and Masked Attention

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Open-R1: a fully open reproduction of DeepSeek-R1

Open-R1: Update #1

Mastering Tensor Dimensions in Transformers

Deriving DPO's Loss

Decoding Strategies in Large Language Models

Fine-tune Llama 2 with DPO

How to build a custom text classifier without days of human labeling

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging

Merge Large Language Models with mergekit

TGI Multi-LoRA: Deploy Once, Serve 30 Models