Pyramidal Spectrum

Frequency-based Hierarchically Vector Quantized VAE for Videos

Official Implementation — WACV 2026

This repository provides the official implementation of the paper:

Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos
Accepted at WACV 2026

We introduce a new autoencoder trained on 4K-resolution video data, featuring a hierarchical frequency-based vector quantization method.
The model leverages a pyramidal spectral representation to produce high-fidelity video reconstructions with an efficient latent structure.

📦 Installation

This implementation requires installing Diffusers from the custom branch:

pip install git+https://github.com/Onkarsus13/diffusers@MMVQVae

🚀 Features

Novel hierarchical frequency-domain quantization
Trained on 4K-resolution video datasets
Multi-level pyramidal spectral decomposition
Highly efficient latent video representation
High-quality reconstructions suitable for generative pipelines

@inproceedings{pyramidal_spectrum_wacv2026,
  title     = {Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos},
  author    = {Tushar, Prakash and Onkar, Susladkar and Inderjit, 
              Inderjit Dhillon and Sparsh Mittal},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year      = {2026}
}

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

onkarsus13
/

MMVQVae

Pyramidal Spectrum

Frequency-based Hierarchically Vector Quantized VAE for Videos

📦 Installation

🚀 Features

Datasets used to train onkarsus13/MMVQVae