This a tokenizer trained using microtok and this is just a demo tokenizer for my blog which you can read here

--- TEST RESULTS ---
2026-03-19 08:58:41,165 - microtok.TikToken.tok - INFO - Input:  'Hello world! I am writing code today.'
2026-03-19 08:58:41,167 - microtok.TikToken.tok - INFO - Tokens: ['Hello', 'Ġworld', '!', 'ĠI', 'Ġam', 'Ġwriting', 'Ġcode', 'Ġtoday', '.']

Usage

from transformers import PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast.from_pretrained("Parveshiiii/microtok")

print(f"Vocab size: {tokenizer.vocab_size}")
print(tokenizer.tokenize("Hello from Panipat!"))

Read the bog on microtok here

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Parveshiiii/microtok