Overview

DINOv3-PokeCon-Head is a lightweight projection head trained on top of the frozen DINOv3 ViT-H/16+ backbone (facebook/dinov3-vith16plus-pretrain-lvd1689m) to produce Pokémon identity embeddings with supervised contrastive loss and the PokeFA dataset:

  • Same Pokémon → high similarity. Images of the same Pokémon should be close in the embedding space when the artwork changes (different art styles, poses, backgrounds, crops, lighting, etc.).
  • Different Pokémon → low similarity. Images of different Pokémon should have dissimilar, even whne two images look superficially similar (e.g., same pose/background/style).

Forms are treated as different identities. For example, Alolan ninetales vs Kanto ninetales (and other major form variants) are considered different Pokémon for training/evaluation purposes because they can have substantially different visual appearance.

A inference.py is included so you can compute embeddings and cosine similarity using either:

  • the base DINOv3 embeddings, or
  • DINOv3 + this projection head.

Illustration

The projection head increases same-Pokémon similarity even across large style shifts while decreasing similarity for different Pokémon, including hard negatives where composition and style are nearly identical.

Same Pokémon (Umbreon), different style/background/pose

Umbreon 1
umbreon1.jpg
Umbreon 2
umbreon2.jpg
  • Base + projection head cosine similarity: 0.898438

Different Pokémon, same style/white background/pose (Umbreon vs Alolan Ninetales)

Umbreon
umbreon1.jpg
Alolan Ninetales
alolan_ninetales.jpg
  • Base (DINOv3 pooled) cosine similarity: 0.703125
  • Base + projection head cosine similarity: 0.136719

Scenario 3 — Stress test: very similar-looking “different Pokémon” (Kanto Ninetales vs Alolan Ninetales)

In this setup, Kanto Ninetales and Alolan Ninetales are different identities (different regional forms). They can look very similar in composition if pose/style/background match:

  • Alolan Ninetales: Ice & Fairy, icy-blue/white fur, cloud-like tails with blue/white tips (winter-fox look)
  • Kanto Ninetales: Fire, golden fur with red accents, more “classic” flowing tails
Kanto Ninetales
ninetales.jpeg (Kanto)
Alolan Ninetales
alolan_ninetales.jpg (Alolan)
  • Base (DINOv3 pooled) cosine similarity: 0.73828
  • Base + projection head cosine similarity: 0.328125
Image sources

License

This projection head is a derivative work built on top of the original DINOv3 model (facebook/dinov3-vith16plus-pretrain-lvd1689m). Accordingly, this repo uses the same DINOv3 license as the upstream model repository:

Pokémon IP disclaimer: Pokémon and related names/imagery are trademarks and copyrighted works of The Pokémon Company, Nintendo, Game Freak, and Creatures Inc. This project is non-commercial and is not affiliated with or endorsed by those entities.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kev0208/DINOv3-PokeCon-Head

Finetuned
(20)
this model

Dataset used to train Kev0208/DINOv3-PokeCon-Head