Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

new activity 2 days ago

stefan-it/Groundsource:Help Addding New Metadata!

published a dataset 2 days ago

stefan-it/Groundsource

updated a dataset 2 days ago

stefan-it/Groundsource

View all activity

Organizations

New activity in stefan-it/Groundsource 2 days ago

Help Addding New Metadata!

#1 opened 2 days ago by

published a dataset 2 days ago

stefan-it/Groundsource

Viewer • Updated 2 days ago • 2.65M • 11

updated a dataset 2 days ago

stefan-it/Groundsource

Viewer • Updated 2 days ago • 2.65M • 11

upvoted 2 articles 2 days ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

7 days ago

•

20

Article

FlashHead: Accelerating Language Model Inference ~ Efficient drop-in replacement for the classification head

4 days ago

•

1

upvoted a paper 3 days ago

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published 6 days ago • 67

upvoted a collection 4 days ago

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 5 days ago • 121

upvoted a paper 4 days ago

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Paper • 2603.10145 • Published 5 days ago • 7

upvoted a collection 4 days ago

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 12 items • Updated 4 days ago • 200

liked a dataset 4 days ago

Viet-Mistral/CulturaY

Viewer • Updated Mar 30, 2024 • 1.14B • 32.4k • 37

upvoted a collection 6 days ago

MixtureVitae study models and datasets

Collection of models and dataset related to MixtureVitae, open and fully reproducible pretraining dataset built from permissive sources • 16 items • Updated Feb 13 • 1

liked a dataset 7 days ago

MultiSynt/MT-Nemotron-CC

Viewer • Updated about 1 month ago • 15.6B • 1.95k • 5

liked a dataset 8 days ago

HuggingFaceFW/finephrase

Viewer • Updated about 2 hours ago • 2.71B • 78.5k • 77

liked a Space 8 days ago

The Synthetic Data Playbook: Generating Trillions of the Finest Tokens

Explore synthetic data experiments on a virtual bookshelf

liked a dataset 9 days ago

cis-lmu/uniOCR.bench

Viewer • Updated 6 days ago • 32.8k • 193 • 2

upvoted an article 10 days ago

Article

Scaling Pedagogical Pre-training: From Optimal Mixing to 10 Billion Tokens

10 days ago

•

4

liked a Space 11 days ago

ALL Bench Leaderboard

ALL Bench Leaderboard

liked a model 11 days ago

allenai/olmOCR-2-7B-1025-FP8

Image-Text-to-Text • 8B • Updated 24 days ago • 251k • 211

liked a dataset 12 days ago

unicamp-dl/mmarco

Updated Mar 6, 2024 • 2.09k • 90

upvoted a collection 13 days ago

🤏 Smol-Data

Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 14 days ago • 12