Similarity search for recommendation — any papers or official tutorials?

EroStefano · November 7, 2025, 3:05pm

A similarity search returns a score which can be interpreted as a recommendation. I did some tests and it works. I was wondering if there is any paper or official tutorial describing this kind of similarity-based search for recommendation.

John6666 · November 7, 2025, 5:37pm

I think it’s a topic for which relatively existing materials are available.

Yes. This “similarity-search-as-recommendation” pattern is well studied and documented. In practice it’s the candidate generation stage of a recommender: embed users or items, do nearest-neighbor search, turn similarity into a ranking score, then optionally re-rank. Canonical papers and official tutorials are below, plus a minimal Hugging Face recipe and pitfalls.

What to read first

Content-based and item-similarity recommenders. Classic, still relevant. Pazzani & Billsus explain content-based scoring; Sarwar et al. formalize item-item similarity and top-N recommendation. (Spinger Links)
Modern large-scale retrieval. YouTube’s two-stage architecture (dual-encoder retrieval + ranking) is the template most shops follow. (ACM Digital Library)
ANN search foundations. HNSW graphs and the FAISS library underlie most vector DBs. Read the HNSW paper and the FAISS library paper. For empirical comparisons, see ANN-Benchmarks. (arXiv)
Hugging Face “official” docs for similarity search.
- Datasets + FAISS: add, save, and query FAISS indexes (add_faiss_index, get_nearest_examples). (Hugging Face)
- LLM Course chapter “Semantic search with FAISS” (step-by-step). (Hugging Face)
- Sentence-Transformers docs for retrieval and cross-encoder re-ranking. (sbert.net)
Two-tower recommenders (official tutorials). TensorFlow Recommenders retrieval walkthroughs and Google’s reference architecture. These show how the similarity score is the recommendation score. (TensorFlow)

Why similarity search works for recommendation

Same objective, different framing. Recommenders need a ranking over items for a user or context. Dense encoders map users and items into a space where “closer” means “more relevant.” Top-k nearest neighbors are your recommendations; the similarity is the recommendation score. This is exactly the retrieval component in industry systems like YouTube. (ACM Digital Library)
Scalability. ANN indexes (FAISS, HNSW) make k-NN feasible at catalog scale with controllable recall/speed trade-offs. Benchmarks and the FAISS paper document these trade-offs. (arXiv)

Minimal Hugging Face recipe (content-based recs)

# pip install datasets sentence-transformers faiss-cpu
# Docs: HF Datasets+FAISS https://huggingface.co/docs/datasets/faiss_es
#       HF LLM Course semantic search https://huggingface.co/learn/llm-course/en/chapter5/6
#       Sentence-Transformers models https://sbert.net/docs/sentence_transformer/pretrained_models.html

from datasets import Dataset
from sentence_transformers import SentenceTransformer
import numpy as np, faiss

items = [{"id": 1, "title": "Matrix"}, {"id": 2, "title": "Inception"}, {"id": 3, "title": "Interstellar"}]
ds = Dataset.from_list(items)

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")  # fast, general-purpose
emb = model.encode(ds["title"], normalize_embeddings=True)  # cosine == dot when normalized
ds = ds.add_column("emb", [e.astype("float32") for e in emb])

d = emb.shape[1]
index = faiss.IndexFlatIP(d)                     # IP index; use HNSW or IVF for scale
index.add(np.vstack(ds["emb"]))                  # vectors must be normalized for cosine
ds = ds.add_faiss_index(column="emb", faiss_index=index)

def recommend(query_text, k=5):
    q = model.encode([query_text], normalize_embeddings=True).astype("float32")
    scores, samples = ds.get_nearest_examples("emb", q[0], k=k)
    # scores are dot products in [-1,1] with normalized vectors; map to [0,1] if desired
    recs = [{"id": s["id"], "title": s["title"], "score": float((sc+1)/2)} for s, sc in zip(samples, scores)]
    return recs

print(recommend("brain-bending sci-fi", k=3))

Notes: normalize_embeddings=True makes cosine and inner product equivalent. FAISS recommends IP + normalization for cosine. For billion-scale, switch to IndexIVF*, IndexHNSWFlat, or PQ variants as per FAISS wiki. (GitHub)

If you want “better than raw similarity”

Re-rank top-k with a cross-encoder. Score each (query, item) pair with a lightweight reranker such as BAAI/bge-reranker-v2-m3. This often lifts NDCG/MRR significantly. (sbert.net)
Diversify with MMR. Maximal Marginal Relevance balances relevance and novelty to avoid near-duplicates in recommendations. Available in OpenSearch and Qdrant; original paper below. (docs.opensearch.org)
Hybrid retrieval. Combine BM25 (sparse) and embeddings (dense) for robustness, then fuse or re-rank. Sentence-Transformers docs cover hybrid strategies and evaluation. (sbert.net)

“Official” tutorials that show similarity → recommendation

Hugging Face: build FAISS indexes inside Datasets; course chapter on semantic search; image-search blog. All are first-party HF guides. (Hugging Face)
TensorFlow Recommenders: retrieval tutorials and sequential retrieval. These are explicit “recommendation via similarity” examples using two-tower embeddings. (TensorFlow)
Keras + ScaNN: end-to-end semantic retrieval with fast ANN. Good blueprint for production inference. (Keras)
Google Cloud (Vertex): reference architecture for two-tower retrieval at scale. (Google Cloud Documentation)

Choosing models and metrics

Embedding models (general-purpose): all-MiniLM-L6-v2 (fast), all-mpnet-base-v2 (higher quality). Sentence-Transformers docs summarize trade-offs. For multilingual or stronger retrieval, see E5 and BGE families. (sbert.net)
Similarity metric: with normalized vectors, cosine ≡ dot product; use FAISS IP indexes and normalize. This is recommended by FAISS and widely discussed in issues. (GitHub)
Evaluation: Use IR metrics on held-out queries: Recall@k, nDCG@k, MRR. Try BEIR or NanoBEIR to compare models. (sbert.net)

Indexing at scale

Pick an ANN index based on latency, recall, memory: FLAT (exact), IVF-PQ (memory-efficient), HNSW (fast high-recall). FAISS wiki and paper detail trade-offs; ANN-Benchmarks shows empirical curves. (GitHub)
Filtering: If you must filter by metadata (e.g., category, locale), prefer engines with efficient filtered ANN. OpenSearch documents MMR and filtered k-NN; FAISS has bitset/IDSelector masks but filtering can reduce speed/recall. (docs.opensearch.org)

Converting similarity to a “recommendation score”

Rank first. Users care about order more than absolute scores. Use similarity directly for ranking.
If you must display a score, map dot-product in [-1,1] to [0,1] by a monotone transform, or fit a calibration curve on offline labels (click/purchase) with isotonic or Platt scaling. Use the calibrated value for UI only; keep IR metrics for model selection. (tzin.bgu.ac.il)

Typical production pipeline (summary)

Encode items; optionally also encode users (two-tower).
Index in FAISS or a vector DB.
Retrieve top-k by similarity (your initial recommendations).
Re-rank with cross-encoder and diversify with MMR.
Measure with Recall@k / nDCG@k; iterate with BEIR-style benchmarks. (ACM Digital Library)

Pitfalls and fixes

Unnormalized vectors → wrong scores. Always normalize for cosine/IP equivalence. FAISS and model maintainers note this repeatedly. (GitHub)
Interpreting similarity as probability. Don’t. Calibrate only if the UI needs a probability-like value; otherwise treat it as a ranking signal. (tzin.bgu.ac.il)
Over-homogeneous results. Add MMR diversity or business rules. (Computer Science School at CCU.)
Vendor lock-in vs portability. HF Datasets+FAISS keeps you portable; you can still move the same embeddings to OpenSearch, Qdrant, Weaviate, or Milvus when you need filtering and ops. (Docs below.) (docs.opensearch.org)

Curated resources

Core papers

Item-item CF for top-N recs. (ETH Zurich)
YouTube DNN recommendations (two-stage). (ACM Digital Library)
HNSW ANN; FAISS library paper; ANN-Benchmarks. (arXiv)
Diversity re-ranking (MMR). (Computer Science School at CCU.)

Hugging Face docs and forums

Datasets + FAISS: search index guide and API. (Hugging Face)
LLM Course: semantic search with FAISS. (Hugging Face)
Forums: add_faiss_index usage, GPU notes. (Hugging Face Forums)
Models: MiniLM, E5, BGE reranker. (Hugging Face)

Recommender tutorials

TensorFlow Recommenders retrieval and sequential retrieval. (TensorFlow)
Vertex AI two-tower reference architecture. (Google Cloud Documentation)
Keras + ScaNN retrieval example. (Keras)

ANN operations

FAISS index selection cheat-sheet; metrics and distances wiki. (GitHub)
ANN-Benchmarks leaderboards and repo. (ann-benchmarks.com)

EroStefano · November 8, 2025, 8:13am

You are so precise and helpful I always think you come from the future to help me avoid a terrible mistake. Thx!

Topic		Replies	Views
Similarity Search in FAISS Returning Raw, Unintelligible Data 🤗Datasets	2	182	January 8, 2025
How to give course advice based on similarity search results (ChromaDB + OpenAI embeddings)? Beginners	2	38	October 31, 2025
Which model to use for suggesting article to the user based on details provided? Beginners	7	1899	May 28, 2021
How to find the closest matching sentence using sentence transformer and faiss? Beginners	1	1246	July 28, 2022
Huggingface datasets, faiss, sbert and cosine similarity 🤗Datasets	1	1017	January 3, 2023