Similarity search for recommendation — any papers or official tutorials?

A similarity search returns a score which can be interpreted as a recommendation. I did some tests and it works. I was wondering if there is any paper or official tutorial describing this kind of similarity-based search for recommendation.

1 Like

I think it’s a topic for which relatively existing materials are available.


Yes. This “similarity-search-as-recommendation” pattern is well studied and documented. In practice it’s the candidate generation stage of a recommender: embed users or items, do nearest-neighbor search, turn similarity into a ranking score, then optionally re-rank. Canonical papers and official tutorials are below, plus a minimal Hugging Face recipe and pitfalls.

What to read first

  • Content-based and item-similarity recommenders. Classic, still relevant. Pazzani & Billsus explain content-based scoring; Sarwar et al. formalize item-item similarity and top-N recommendation. (Spinger Links)

  • Modern large-scale retrieval. YouTube’s two-stage architecture (dual-encoder retrieval + ranking) is the template most shops follow. (ACM Digital Library)

  • ANN search foundations. HNSW graphs and the FAISS library underlie most vector DBs. Read the HNSW paper and the FAISS library paper. For empirical comparisons, see ANN-Benchmarks. (arXiv)

  • Hugging Face “official” docs for similarity search.

    • :hugs: Datasets + FAISS: add, save, and query FAISS indexes (add_faiss_index, get_nearest_examples). (Hugging Face)
    • LLM Course chapter “Semantic search with FAISS” (step-by-step). (Hugging Face)
    • Sentence-Transformers docs for retrieval and cross-encoder re-ranking. (sbert.net)
  • Two-tower recommenders (official tutorials). TensorFlow Recommenders retrieval walkthroughs and Google’s reference architecture. These show how the similarity score is the recommendation score. (TensorFlow)

Why similarity search works for recommendation

  • Same objective, different framing. Recommenders need a ranking over items for a user or context. Dense encoders map users and items into a space where “closer” means “more relevant.” Top-k nearest neighbors are your recommendations; the similarity is the recommendation score. This is exactly the retrieval component in industry systems like YouTube. (ACM Digital Library)
  • Scalability. ANN indexes (FAISS, HNSW) make k-NN feasible at catalog scale with controllable recall/speed trade-offs. Benchmarks and the FAISS paper document these trade-offs. (arXiv)

Minimal Hugging Face recipe (content-based recs)

# pip install datasets sentence-transformers faiss-cpu
# Docs: HF Datasets+FAISS https://huggingface.co/docs/datasets/faiss_es
#       HF LLM Course semantic search https://huggingface.co/learn/llm-course/en/chapter5/6
#       Sentence-Transformers models https://sbert.net/docs/sentence_transformer/pretrained_models.html

from datasets import Dataset
from sentence_transformers import SentenceTransformer
import numpy as np, faiss

items = [{"id": 1, "title": "Matrix"}, {"id": 2, "title": "Inception"}, {"id": 3, "title": "Interstellar"}]
ds = Dataset.from_list(items)

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")  # fast, general-purpose
emb = model.encode(ds["title"], normalize_embeddings=True)  # cosine == dot when normalized
ds = ds.add_column("emb", [e.astype("float32") for e in emb])

d = emb.shape[1]
index = faiss.IndexFlatIP(d)                     # IP index; use HNSW or IVF for scale
index.add(np.vstack(ds["emb"]))                  # vectors must be normalized for cosine
ds = ds.add_faiss_index(column="emb", faiss_index=index)

def recommend(query_text, k=5):
    q = model.encode([query_text], normalize_embeddings=True).astype("float32")
    scores, samples = ds.get_nearest_examples("emb", q[0], k=k)
    # scores are dot products in [-1,1] with normalized vectors; map to [0,1] if desired
    recs = [{"id": s["id"], "title": s["title"], "score": float((sc+1)/2)} for s, sc in zip(samples, scores)]
    return recs

print(recommend("brain-bending sci-fi", k=3))

Notes: normalize_embeddings=True makes cosine and inner product equivalent. FAISS recommends IP + normalization for cosine. For billion-scale, switch to IndexIVF*, IndexHNSWFlat, or PQ variants as per FAISS wiki. (GitHub)

If you want “better than raw similarity”

  1. Re-rank top-k with a cross-encoder. Score each (query, item) pair with a lightweight reranker such as BAAI/bge-reranker-v2-m3. This often lifts NDCG/MRR significantly. (sbert.net)
  2. Diversify with MMR. Maximal Marginal Relevance balances relevance and novelty to avoid near-duplicates in recommendations. Available in OpenSearch and Qdrant; original paper below. (docs.opensearch.org)
  3. Hybrid retrieval. Combine BM25 (sparse) and embeddings (dense) for robustness, then fuse or re-rank. Sentence-Transformers docs cover hybrid strategies and evaluation. (sbert.net)

“Official” tutorials that show similarity → recommendation

  • Hugging Face: build FAISS indexes inside :hugs: Datasets; course chapter on semantic search; image-search blog. All are first-party HF guides. (Hugging Face)
  • TensorFlow Recommenders: retrieval tutorials and sequential retrieval. These are explicit “recommendation via similarity” examples using two-tower embeddings. (TensorFlow)
  • Keras + ScaNN: end-to-end semantic retrieval with fast ANN. Good blueprint for production inference. (Keras)
  • Google Cloud (Vertex): reference architecture for two-tower retrieval at scale. (Google Cloud Documentation)

Choosing models and metrics

  • Embedding models (general-purpose): all-MiniLM-L6-v2 (fast), all-mpnet-base-v2 (higher quality). Sentence-Transformers docs summarize trade-offs. For multilingual or stronger retrieval, see E5 and BGE families. (sbert.net)
  • Similarity metric: with normalized vectors, cosine ≡ dot product; use FAISS IP indexes and normalize. This is recommended by FAISS and widely discussed in issues. (GitHub)
  • Evaluation: Use IR metrics on held-out queries: Recall@k, nDCG@k, MRR. Try BEIR or NanoBEIR to compare models. (sbert.net)

Indexing at scale

  • Pick an ANN index based on latency, recall, memory: FLAT (exact), IVF-PQ (memory-efficient), HNSW (fast high-recall). FAISS wiki and paper detail trade-offs; ANN-Benchmarks shows empirical curves. (GitHub)
  • Filtering: If you must filter by metadata (e.g., category, locale), prefer engines with efficient filtered ANN. OpenSearch documents MMR and filtered k-NN; FAISS has bitset/IDSelector masks but filtering can reduce speed/recall. (docs.opensearch.org)

Converting similarity to a “recommendation score”

  • Rank first. Users care about order more than absolute scores. Use similarity directly for ranking.
  • If you must display a score, map dot-product in [-1,1] to [0,1] by a monotone transform, or fit a calibration curve on offline labels (click/purchase) with isotonic or Platt scaling. Use the calibrated value for UI only; keep IR metrics for model selection. (tzin.bgu.ac.il)

Typical production pipeline (summary)

  1. Encode items; optionally also encode users (two-tower).
  2. Index in FAISS or a vector DB.
  3. Retrieve top-k by similarity (your initial recommendations).
  4. Re-rank with cross-encoder and diversify with MMR.
  5. Measure with Recall@k / nDCG@k; iterate with BEIR-style benchmarks. (ACM Digital Library)

Pitfalls and fixes

  • Unnormalized vectors → wrong scores. Always normalize for cosine/IP equivalence. FAISS and model maintainers note this repeatedly. (GitHub)
  • Interpreting similarity as probability. Don’t. Calibrate only if the UI needs a probability-like value; otherwise treat it as a ranking signal. (tzin.bgu.ac.il)
  • Over-homogeneous results. Add MMR diversity or business rules. (Computer Science School at CCU.)
  • Vendor lock-in vs portability. HF Datasets+FAISS keeps you portable; you can still move the same embeddings to OpenSearch, Qdrant, Weaviate, or Milvus when you need filtering and ops. (Docs below.) (docs.opensearch.org)

Curated resources

Core papers

Hugging Face docs and forums

Recommender tutorials

ANN operations

  • FAISS index selection cheat-sheet; metrics and distances wiki. (GitHub)
  • ANN-Benchmarks leaderboards and repo. (ann-benchmarks.com)
1 Like

You are so precise and helpful I always think you come from the future to help me avoid a terrible mistake. Thx!

1 Like