MOMO: Mars Orbital Model

MOMO is the first multi-sensor foundation model for Mars remote sensing, accepted at CVPR 2026.

It integrates representations learned independently from three Martian orbital sensors (HiRISE, CTX, and THEMIS) spanning resolutions from 0.25 m/pixel to 100 m/pixel, using task arithmetic model merging with a novel Equal Validation Loss (EVL) checkpoint selection strategy.

๐Ÿ“„ Paper: MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

๐Ÿ’ป Code: github.com/kerner-lab/MOMO

๐Ÿ›ข๏ธ Pre-training Data: huggingface.co/datasets/Mirali33/MOMO-pretraining-data

๐Ÿ† Mars-Bench (Downstream tasks): mars-bench.github.io

MOMO Pre-training Samples
MOMO trains separate models on HiRISE, CTX, and THEMIS data and merges them into a single foundation model capable of diverse Mars orbital tasks.

Checkpoints

Each model size includes 5 checkpoints:

File Description
ctx.pth Pre-trained on CTX (ConTeXt Camera)
hirise.pth Pre-trained on HiRISE (High Resolution Imaging Science Experiment)
themis.pth Pre-trained on THEMIS (THermal EMission Imaging System)
hirise_ctx_themis.pth Pre-trained jointly on all three sensors
momo.pth MOMO merged model via task arithmetic + EVL (main contribution)

Each checkpoint is available for three ViT architectures (all with patch size 16):

Folder Architecture
vit-s-16/ ViT-small
vit-b-16/ ViT-base
vit-l-16/ ViT-large

ViT-base is the primary model reported in the main paper. ViT-small and ViT-large results are reported in the supplementary material.


Usage

import torch
from huggingface_hub import hf_hub_download

# Download MOMO ViT-Base checkpoint
path = hf_hub_download(repo_id="Mirali33/MOMO", filename="vit-b-16/momo.pth")
checkpoint = torch.load(path, map_location="cpu", weights_only=False)

For full training and fine-tuning code, see the MOMO GitHub repository.


Training Data

MOMO is pre-trained on approximately 12 million samples (4M per sensor) from Mars orbital imagery:

  • HiRISE: 0.25 m/pixel high-resolution visible spectrum images
  • CTX: 5 m/pixel context camera images
  • THEMIS: 100 m/pixel thermal infrared images

Evaluation

MOMO is evaluated on 9 downstream tasks from Mars-Bench (4 classification, 5 segmentation), outperforming ImageNet pre-training, Earth observation foundation models (DINOv3, SatMAE, CROMA, Prithvi, TerraFM), sensor-specific pre-training, and fully-supervised baselines.


Citation

@InProceedings{Purohit_2026_CVPR,
    author    = {Purohit, Mirali and Gajera, Bimal and Mehta, Irish and Tokas, Bhanu and Adler, Jacob and Lu, Steven and Dickenshied, Scott and Diniega, Serina and Bue, Brian and Rebbapragada, Umaa and Kerner, Hannah},
    title     = {MOMO: Mars Orbital MOdel Foundation Model for Mars Orbital Applications},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {27772-27782}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support