microtok example
Collection
https://github.com/Parveshiiii/microtok • 1 item • Updated
This a tokenizer trained using microtok and this is just a demo tokenizer for my blog which you can read here
--- TEST RESULTS ---
2026-03-19 08:58:41,165 - microtok.TikToken.tok - INFO - Input: 'Hello world! I am writing code today.'
2026-03-19 08:58:41,167 - microtok.TikToken.tok - INFO - Tokens: ['Hello', 'Ġworld', '!', 'ĠI', 'Ġam', 'Ġwriting', 'Ġcode', 'Ġtoday', '.']
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("Parveshiiii/microtok")
print(f"Vocab size: {tokenizer.vocab_size}")
print(tokenizer.tokenize("Hello from Panipat!"))
Read the bog on microtok here