A similarity search returns a score which can be interpreted as a recommendation. I did some tests and it works. I was wondering if there is any paper or official tutorial describing this kind of similarity-based search for recommendation.
I think it’s a topic for which relatively existing materials are available.
Yes. This “similarity-search-as-recommendation” pattern is well studied and documented. In practice it’s the candidate generation stage of a recommender: embed users or items, do nearest-neighbor search, turn similarity into a ranking score, then optionally re-rank. Canonical papers and official tutorials are below, plus a minimal Hugging Face recipe and pitfalls.
What to read first
-
Content-based and item-similarity recommenders. Classic, still relevant. Pazzani & Billsus explain content-based scoring; Sarwar et al. formalize item-item similarity and top-N recommendation. (Spinger Links)
-
Modern large-scale retrieval. YouTube’s two-stage architecture (dual-encoder retrieval + ranking) is the template most shops follow. (ACM Digital Library)
-
ANN search foundations. HNSW graphs and the FAISS library underlie most vector DBs. Read the HNSW paper and the FAISS library paper. For empirical comparisons, see ANN-Benchmarks. (arXiv)
-
Hugging Face “official” docs for similarity search.
Datasets + FAISS: add, save, and query FAISS indexes (add_faiss_index,get_nearest_examples). (Hugging Face)- LLM Course chapter “Semantic search with FAISS” (step-by-step). (Hugging Face)
- Sentence-Transformers docs for retrieval and cross-encoder re-ranking. (sbert.net)
-
Two-tower recommenders (official tutorials). TensorFlow Recommenders retrieval walkthroughs and Google’s reference architecture. These show how the similarity score is the recommendation score. (TensorFlow)
Why similarity search works for recommendation
- Same objective, different framing. Recommenders need a ranking over items for a user or context. Dense encoders map users and items into a space where “closer” means “more relevant.” Top-k nearest neighbors are your recommendations; the similarity is the recommendation score. This is exactly the retrieval component in industry systems like YouTube. (ACM Digital Library)
- Scalability. ANN indexes (FAISS, HNSW) make k-NN feasible at catalog scale with controllable recall/speed trade-offs. Benchmarks and the FAISS paper document these trade-offs. (arXiv)
Minimal Hugging Face recipe (content-based recs)
# pip install datasets sentence-transformers faiss-cpu
# Docs: HF Datasets+FAISS https://huggingface.co/docs/datasets/faiss_es
# HF LLM Course semantic search https://huggingface.co/learn/llm-course/en/chapter5/6
# Sentence-Transformers models https://sbert.net/docs/sentence_transformer/pretrained_models.html
from datasets import Dataset
from sentence_transformers import SentenceTransformer
import numpy as np, faiss
items = [{"id": 1, "title": "Matrix"}, {"id": 2, "title": "Inception"}, {"id": 3, "title": "Interstellar"}]
ds = Dataset.from_list(items)
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") # fast, general-purpose
emb = model.encode(ds["title"], normalize_embeddings=True) # cosine == dot when normalized
ds = ds.add_column("emb", [e.astype("float32") for e in emb])
d = emb.shape[1]
index = faiss.IndexFlatIP(d) # IP index; use HNSW or IVF for scale
index.add(np.vstack(ds["emb"])) # vectors must be normalized for cosine
ds = ds.add_faiss_index(column="emb", faiss_index=index)
def recommend(query_text, k=5):
q = model.encode([query_text], normalize_embeddings=True).astype("float32")
scores, samples = ds.get_nearest_examples("emb", q[0], k=k)
# scores are dot products in [-1,1] with normalized vectors; map to [0,1] if desired
recs = [{"id": s["id"], "title": s["title"], "score": float((sc+1)/2)} for s, sc in zip(samples, scores)]
return recs
print(recommend("brain-bending sci-fi", k=3))
Notes: normalize_embeddings=True makes cosine and inner product equivalent. FAISS recommends IP + normalization for cosine. For billion-scale, switch to IndexIVF*, IndexHNSWFlat, or PQ variants as per FAISS wiki. (GitHub)
If you want “better than raw similarity”
- Re-rank top-k with a cross-encoder. Score each (query, item) pair with a lightweight reranker such as
BAAI/bge-reranker-v2-m3. This often lifts NDCG/MRR significantly. (sbert.net) - Diversify with MMR. Maximal Marginal Relevance balances relevance and novelty to avoid near-duplicates in recommendations. Available in OpenSearch and Qdrant; original paper below. (docs.opensearch.org)
- Hybrid retrieval. Combine BM25 (sparse) and embeddings (dense) for robustness, then fuse or re-rank. Sentence-Transformers docs cover hybrid strategies and evaluation. (sbert.net)
“Official” tutorials that show similarity → recommendation
- Hugging Face: build FAISS indexes inside
Datasets; course chapter on semantic search; image-search blog. All are first-party HF guides. (Hugging Face) - TensorFlow Recommenders: retrieval tutorials and sequential retrieval. These are explicit “recommendation via similarity” examples using two-tower embeddings. (TensorFlow)
- Keras + ScaNN: end-to-end semantic retrieval with fast ANN. Good blueprint for production inference. (Keras)
- Google Cloud (Vertex): reference architecture for two-tower retrieval at scale. (Google Cloud Documentation)
Choosing models and metrics
- Embedding models (general-purpose):
all-MiniLM-L6-v2(fast),all-mpnet-base-v2(higher quality). Sentence-Transformers docs summarize trade-offs. For multilingual or stronger retrieval, see E5 and BGE families. (sbert.net) - Similarity metric: with normalized vectors, cosine ≡ dot product; use FAISS IP indexes and normalize. This is recommended by FAISS and widely discussed in issues. (GitHub)
- Evaluation: Use IR metrics on held-out queries: Recall@k, nDCG@k, MRR. Try BEIR or NanoBEIR to compare models. (sbert.net)
Indexing at scale
- Pick an ANN index based on latency, recall, memory: FLAT (exact), IVF-PQ (memory-efficient), HNSW (fast high-recall). FAISS wiki and paper detail trade-offs; ANN-Benchmarks shows empirical curves. (GitHub)
- Filtering: If you must filter by metadata (e.g., category, locale), prefer engines with efficient filtered ANN. OpenSearch documents MMR and filtered k-NN; FAISS has bitset/IDSelector masks but filtering can reduce speed/recall. (docs.opensearch.org)
Converting similarity to a “recommendation score”
- Rank first. Users care about order more than absolute scores. Use similarity directly for ranking.
- If you must display a score, map dot-product in [-1,1] to [0,1] by a monotone transform, or fit a calibration curve on offline labels (click/purchase) with isotonic or Platt scaling. Use the calibrated value for UI only; keep IR metrics for model selection. (tzin.bgu.ac.il)
Typical production pipeline (summary)
- Encode items; optionally also encode users (two-tower).
- Index in FAISS or a vector DB.
- Retrieve top-k by similarity (your initial recommendations).
- Re-rank with cross-encoder and diversify with MMR.
- Measure with Recall@k / nDCG@k; iterate with BEIR-style benchmarks. (ACM Digital Library)
Pitfalls and fixes
- Unnormalized vectors → wrong scores. Always normalize for cosine/IP equivalence. FAISS and model maintainers note this repeatedly. (GitHub)
- Interpreting similarity as probability. Don’t. Calibrate only if the UI needs a probability-like value; otherwise treat it as a ranking signal. (tzin.bgu.ac.il)
- Over-homogeneous results. Add MMR diversity or business rules. (Computer Science School at CCU.)
- Vendor lock-in vs portability. HF Datasets+FAISS keeps you portable; you can still move the same embeddings to OpenSearch, Qdrant, Weaviate, or Milvus when you need filtering and ops. (Docs below.) (docs.opensearch.org)
Curated resources
Core papers
- Item-item CF for top-N recs. (ETH Zurich)
- YouTube DNN recommendations (two-stage). (ACM Digital Library)
- HNSW ANN; FAISS library paper; ANN-Benchmarks. (arXiv)
- Diversity re-ranking (MMR). (Computer Science School at CCU.)
Hugging Face docs and forums
- Datasets + FAISS: search index guide and API. (Hugging Face)
- LLM Course: semantic search with FAISS. (Hugging Face)
- Forums:
add_faiss_indexusage, GPU notes. (Hugging Face Forums) - Models: MiniLM, E5, BGE reranker. (Hugging Face)
Recommender tutorials
- TensorFlow Recommenders retrieval and sequential retrieval. (TensorFlow)
- Vertex AI two-tower reference architecture. (Google Cloud Documentation)
- Keras + ScaNN retrieval example. (Keras)
ANN operations
- FAISS index selection cheat-sheet; metrics and distances wiki. (GitHub)
- ANN-Benchmarks leaderboards and repo. (ann-benchmarks.com)
You are so precise and helpful I always think you come from the future to help me avoid a terrible mistake. Thx!