Nebula
1. Introduction
Nebula is a 320M-parameter generalist Small Reasoning Model trained on 200B+ tokens, designed for edge AI and on-device deployment.
Nebula is designed to deliver an unusually strong balance of memory, general reasoning, math, and retrieval-friendly behavior for its size class, aiming to outperform many small models of a similar parameter range on non-code, industry-style benchmarks.
2. Reasoning style
Nebula’s reasoning traces use an intentionally compact style with dense, short, frequently non-verbal sentences, optimized for efficiency under limited model capacity.
Traces use the following stenographic notation integrated into special tokens:
Logical markers
| Token | Meaning | Usage |
|---|---|---|
| → | derivation / implication | For very short causal/logical flow |
| ↺ | iterative return / refinement loop | For backtracking, reconsidering priors, RAG re-querying |
| ? | uncertainty/questions to resolve | Can be appended to short expressions/words, not only interrogatives |
| !/※ | insight/breakthroughs | Emphatic mark for knowledge discovery |
| ≈ | approximation/estimates | For intermediary hypothesis / uncertain preliminary statements |
| ∴ | therefore / final step | Use sparingly to mark stable conclusions |
Uncertainty
| Token | Meaning | Usage |
|---|---|---|
| ● | high confidence | well-supported empirical/theoretical ground; “anchor points.” |
| ◐ | medium/partial confidence | incomplete data; plausible but unverified links |
| ○ | low confidence | speculation, missing context, weak inference chain |
| ⚠ | bias/premise risk | domain mismatch, cultural assumptions, language-switch artifacts |
| ?maybe? | soft speculation | marks tentative ideas, branches that might collapse later |
Verification process
| Token | Meaning | Usage |
|---|---|---|
| ☐ | unverified hypothesis | raw claim, no cross-check yet |
| ☑ | intermediate verification | one source/argument supports it |
| ✓ | confirmed/validated | multiple independent supports (●-level) |
This reasoning format is designed to remain expressive while being lightweight enough for a small model.
3. Fine-Tuning/RL
Nebula has been successfully fine-tuned for a variety of tasks
Because Nebula is a reasoning-oriented model, it is expected to train well with reinforcement learning methods such as GRPO, both for verifiable tasks (with objective rewards) and for subjective tasks using an LLM-as-a-judge.
4. Benchmarks
| Model | MMLU |
|---|---|
| Nebula | 40.0 |
| SmolLM2-360M | 35.8 |
| Gemma 3 270M (IT) | 26.5 |
| Granite-4.0-H-350M | 36.21 |
- Downloads last month
- 328