Diffusers
Safetensors
English
How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("onkarsus13/MMVQVae", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Pyramidal Spectrum

Frequency-based Hierarchically Vector Quantized VAE for Videos

Official Implementation β€” WACV 2026

This repository provides the official implementation of the paper:

Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos
Accepted at WACV 2026

We introduce a new autoencoder trained on 4K-resolution video data, featuring a hierarchical frequency-based vector quantization method.
The model leverages a pyramidal spectral representation to produce high-fidelity video reconstructions with an efficient latent structure.


πŸ“¦ Installation

This implementation requires installing Diffusers from the custom branch:

pip install git+https://github.com/Onkarsus13/diffusers@MMVQVae

πŸš€ Features

  • Novel hierarchical frequency-domain quantization
  • Trained on 4K-resolution video datasets
  • Multi-level pyramidal spectral decomposition
  • Highly efficient latent video representation
  • High-quality reconstructions suitable for generative pipelines

@inproceedings{pyramidal_spectrum_wacv2026,
  title     = {Pyramidal Spectrum: Frequency-based Hierarchically Vector Quantized VAE for Videos},
  author    = {Tushar, Prakash and Onkar, Susladkar and Inderjit, 
              Inderjit Dhillon and Sparsh Mittal},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year      = {2026}
}
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train onkarsus13/MMVQVae