MOMO: Mars Orbital Model

MOMO is the first multi-sensor foundation model for Mars remote sensing, accepted at CVPR 2026.

It integrates representations learned independently from three Martian orbital sensors (HiRISE, CTX, and THEMIS) spanning resolutions from 0.25 m/pixel to 100 m/pixel, using task arithmetic model merging with a novel Equal Validation Loss (EVL) checkpoint selection strategy.

📄 Paper: MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

💻 Code: github.com/kerner-lab/MOMO

🛢️ Pre-training Data: huggingface.co/datasets/Mirali33/MOMO-pretraining-data

🏆 Mars-Bench (Downstream tasks): mars-bench.github.io

MOMO Pre-training Samples — MOMO trains separate models on HiRISE, CTX, and THEMIS data and merges them into a single foundation model capable of diverse Mars orbital tasks.

Checkpoints

Each model size includes 5 checkpoints:

File	Description
`ctx.pth`	Pre-trained on CTX (ConTeXt Camera)
`hirise.pth`	Pre-trained on HiRISE (High Resolution Imaging Science Experiment)
`themis.pth`	Pre-trained on THEMIS (THermal EMission Imaging System)
`hirise_ctx_themis.pth`	Pre-trained jointly on all three sensors
`momo.pth`	MOMO merged model via task arithmetic + EVL (main contribution)

Each checkpoint is available for three ViT architectures (all with patch size 16):

Folder	Architecture
`vit-s-16/`	ViT-small
`vit-b-16/`	ViT-base
`vit-l-16/`	ViT-large

ViT-base is the primary model reported in the main paper. ViT-small and ViT-large results are reported in the supplementary material.

Usage

import torch
from huggingface_hub import hf_hub_download

# Download MOMO ViT-Base checkpoint
path = hf_hub_download(repo_id="Mirali33/MOMO", filename="vit-b-16/momo.pth")
checkpoint = torch.load(path, map_location="cpu", weights_only=False)

For full training and fine-tuning code, see the MOMO GitHub repository.

Training Data

MOMO is pre-trained on approximately 12 million samples (4M per sensor) from Mars orbital imagery:

HiRISE: 0.25 m/pixel high-resolution visible spectrum images
CTX: 5 m/pixel context camera images
THEMIS: 100 m/pixel thermal infrared images

Evaluation

MOMO is evaluated on 9 downstream tasks from Mars-Bench (4 classification, 5 segmentation), outperforming ImageNet pre-training, Earth observation foundation models (DINOv3, SatMAE, CROMA, Prithvi, TerraFM), sensor-specific pre-training, and fully-supervised baselines.

Citation

@InProceedings{Purohit_2026_CVPR,
    author    = {Purohit, Mirali and Gajera, Bimal and Mehta, Irish and Tokas, Bhanu and Adler, Jacob and Lu, Steven and Dickenshied, Scott and Diniega, Serina and Bue, Brian and Rebbapragada, Umaa and Kerner, Hannah},
    title     = {MOMO: Mars Orbital MOdel Foundation Model for Mars Orbital Applications},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {27772-27782}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support