N8Programs
/

NextTerm-47M

@@ -13,7 +13,7 @@ tags:
 NextTerm-47M is a pretrained transformer w/ 47.2M parameters, trained on 1.9 billion tokens of augmented data from the On-Line Encyclopedia of Integer Sequences (OEIS). It is designed to predict the next term in integer sequences. It displays exceptional in-context learning capabilities, and outperforms far larger generic LLMs on OEIS sequence completion tasks. It supports MLX and HuggingFace transformers.
-The model is pretrained on sequences of up to length 1024, but can potentially generalize to longer sequences (this has not been extensively tested). All pretraining was done on a single RunPod H100 using MLX's CUDA backend. It is pretrained in `float32`. The model was trained for 335,000 steps with a batch size of 32 sequences, using Muon w/ a learning rate of 1e-2 and a fallback optimizer of AdamW with a learning rate of 1e-4. It uses the Qwen3 architecture with 12 layers, a model dimension of 512, 8 attention heads, and a feedforward dimension of 2048. It was trained using an estimated 85 exaFLOPs.
 The model's tokenizer accepts integer sequences formatted as comma-separated values, e.g. "1,-2,3,-4,". The model outputs the next terms in the sequence in the same format. All non-digit, comma, or negative sign characters are ignored by the tokenizer. Note the model has not been trained on numbers with leading zeros, so inputs like "01,02,03," may yield unpredictable results as they are out-of-distribution. The model tokenizes digits individually, so larger integers will be represented by multiple tokens (e.g. "123" is tokenized as "1", "2", "3"). This means there is no magnitude limit on the integers the model can handle, but longer integers will consume more of the model's context window.

 NextTerm-47M is a pretrained transformer w/ 47.2M parameters, trained on 1.9 billion tokens of augmented data from the On-Line Encyclopedia of Integer Sequences (OEIS). It is designed to predict the next term in integer sequences. It displays exceptional in-context learning capabilities, and outperforms far larger generic LLMs on OEIS sequence completion tasks. It supports MLX and HuggingFace transformers.
+The model is pretrained on sequences of up to length 1024, but can potentially generalize to longer sequences (this has not been extensively tested). All pretraining was done on a single RunPod H100 using MLX's CUDA backend. It is pretrained in `float32`. The model was trained for 335,000 steps with a batch size of 32 sequences, using Muon w/ a learning rate of 1e-2 and a fallback optimizer of AdamW with a learning rate of 1e-4. It uses the Qwen3 architecture with 12 layers, a model dimension of 512, 8 attention heads, and a feedforward dimension of 2048. It was trained using an estimated 540 exaFLOPs.
 The model's tokenizer accepts integer sequences formatted as comma-separated values, e.g. "1,-2,3,-4,". The model outputs the next terms in the sequence in the same format. All non-digit, comma, or negative sign characters are ignored by the tokenizer. Note the model has not been trained on numbers with leading zeros, so inputs like "01,02,03," may yield unpredictable results as they are out-of-distribution. The model tokenizes digits individually, so larger integers will be represented by multiple tokens (e.g. "123" is tokenized as "1", "2", "3"). This means there is no magnitude limit on the integers the model can handle, but longer integers will consume more of the model's context window.