Overview
DINOv3-PokeCon-Head is a lightweight projection head trained on top of the frozen DINOv3 ViT-H/16+ backbone (facebook/dinov3-vith16plus-pretrain-lvd1689m) to produce Pokémon identity embeddings with supervised contrastive loss and the PokeFA dataset:
- Same Pokémon → high similarity. Images of the same Pokémon should be close in the embedding space when the artwork changes (different art styles, poses, backgrounds, crops, lighting, etc.).
- Different Pokémon → low similarity. Images of different Pokémon should have dissimilar, even whne two images look superficially similar (e.g., same pose/background/style).
Forms are treated as different identities. For example, Alolan ninetales vs Kanto ninetales (and other major form variants) are considered different Pokémon for training/evaluation purposes because they can have substantially different visual appearance.
A inference.py is included so you can compute embeddings and cosine similarity using either:
- the base DINOv3 embeddings, or
- DINOv3 + this projection head.
Illustration
The projection head increases same-Pokémon similarity even across large style shifts while decreasing similarity for different Pokémon, including hard negatives where composition and style are nearly identical.
Same Pokémon (Umbreon), different style/background/pose
- Base + projection head cosine similarity: 0.898438
Different Pokémon, same style/white background/pose (Umbreon vs Alolan Ninetales)
- Base (DINOv3 pooled) cosine similarity: 0.703125
- Base + projection head cosine similarity: 0.136719
Scenario 3 — Stress test: very similar-looking “different Pokémon” (Kanto Ninetales vs Alolan Ninetales)
In this setup, Kanto Ninetales and Alolan Ninetales are different identities (different regional forms). They can look very similar in composition if pose/style/background match:
- Alolan Ninetales: Ice & Fairy, icy-blue/white fur, cloud-like tails with blue/white tips (winter-fox look)
- Kanto Ninetales: Fire, golden fur with red accents, more “classic” flowing tails
- Base (DINOv3 pooled) cosine similarity: 0.73828
- Base + projection head cosine similarity: 0.328125
Image sources
- umbreon1.jpg: source_url
- umbreon2.jpg: source_url
- alolan_ninetales.jpg: source_url
- ninetales.jpeg: source_url
License
This projection head is a derivative work built on top of the original DINOv3 model (facebook/dinov3-vith16plus-pretrain-lvd1689m). Accordingly, this repo uses the same DINOv3 license as the upstream model repository:
Pokémon IP disclaimer: Pokémon and related names/imagery are trademarks and copyrighted works of The Pokémon Company, Nintendo, Game Freak, and Creatures Inc. This project is non-commercial and is not affiliated with or endorsed by those entities.
Model tree for Kev0208/DINOv3-PokeCon-Head
Base model
facebook/dinov3-vit7b16-pretrain-lvd1689m