Under Experiment

GOALS: SOTA math reasoning for sub-400M parameter LLM

Benchmark

Note: we use thinking token forcing because this model occasionally output response directly without thinking tag.

Standard Decoding:

AIME 2025: 28.3% (+1.2% from LiquidAI/LFM2-350M-Math) -> Link for benchmark output: https://docs.google.com/spreadsheets/d/1Gr0AFT08tWQ8ocPK3TEwfwz1h336Pfe4TB_IQxKDt3A/
HMMT 2025: TBA
BRUMO 2025: TBA
CMIMC 2025: TBA

Recursive Self-Aggregation:

GGUF

Model size

0.4B params

Architecture

lfm2

Hardware compatibility

8-bit