This a tokenizer trained using microtok and this is just a demo tokenizer for my blog which you can read here

--- TEST RESULTS ---
2026-03-19 08:58:41,165 - microtok.TikToken.tok - INFO - Input:  'Hello world! I am writing code today.'
2026-03-19 08:58:41,167 - microtok.TikToken.tok - INFO - Tokens: ['Hello', 'Ġworld', '!', 'ĠI', 'Ġam', 'Ġwriting', 'Ġcode', 'Ġtoday', '.']

Usage

from transformers import PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast.from_pretrained("Parveshiiii/microtok")

print(f"Vocab size: {tokenizer.vocab_size}")
print(tokenizer.tokenize("Hello from Panipat!"))

Read the bog on microtok here

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Parveshiiii/microtok

microtok example

Collection

https://github.com/Parveshiiii/microtok • 1 item • Updated 9 days ago