FAISS: Vector Similarity Search at Scale

How does Netflix know you might like Squid Game? How does Amazon know you need printer paper after buying a printer? The answer is vector similarity search, and FAISS is the engine that powers it at scale.

Why Vector Search?

Traditional search relies on keywords and metadata. But understanding semantic similarity requires representing items as vectors:

Movie A (Action):  [0.9, 0.1, 0.8, 0.2]  <- Action-heavy
Movie B (Action):  [0.8, 0.2, 0.7, 0.3]  <- Similar to A
Movie C (Drama):   [0.1, 0.9, 0.2, 0.8]  <- Different from A

Vector search finds items where the vector distance is smallest. This powers:

Recommendation systems (Netflix, Amazon, Spotify)
Semantic search (find by meaning, not keywords)
RAG (Retrieval Augmented Generation)
Image and audio similarity search

What Is FAISS?

FAISS (Facebook AI Similarity Search) is an open-source library developed by Meta for efficient similarity search and clustering of dense vectors.

Features:

Handles billions of vectors
GPU-accelerated
Multiple index types for different speed/accuracy trade-offs
Python and C++ interfaces

How FAISS Works

Installation

pip install faiss-cpu          # CPU version
pip install faiss-gpu          # GPU version (CUDA required)

Basic Usage: Brute Force Search

For small datasets (< 100K vectors), brute force L2 distance works fine:

import numpy as np
import faiss
import time

# Generate test data: 10,000 vectors of 128 dimensions
np.random.seed(42)
d = 128  # vector dimension
n = 10000  # number of vectors

xb = np.random.random((n, d)).astype("float32")

# Create index (brute force L2 distance)
index = faiss.IndexFlatL2(d)
print(f"Index trained: {index.is_trained}")

# Add vectors
index.add(xb)
print(f"Index size: {index.ntotal}")

# Search: find 5 nearest neighbors for 5 query vectors
xq = np.random.random((5, d)).astype("float32")

start = time.time()
k = 5
distances, indices = index.search(xq, k)
elapsed = time.time() - start

print(f"\n=== Brute Force Search ===")
print(f"Time: {elapsed*1000:.2f} ms")
for i in range(5):
    print(f"Query {i}: nearest = {indices[i]}, distance = {distances[i]}")

IVF: Faster Search for Large Datasets

For datasets over 1 million vectors, use IVF (Inverted File) indexing:

# Create IVF index with 100 centroids
nlist = 100
quantizer = faiss.IndexFlatL2(d)
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)

# IVF needs training
index_ivf.train(xb)
index_ivf.add(xb)

# nprobe controls how many clusters to explore
index_ivf.nprobe = 10

start = time.time()
distances_ivf, indices_ivf = index_ivf.search(xq, k)
elapsed_ivf = time.time() - start

print(f"\n=== IVF Search Results ===")
print(f"Time: {elapsed_ivf*1000:.2f} ms")

# Compare accuracy with brute force
correct = sum(
    1 for i in range(5)
    if set(indices[i]) == set(indices_ivf[i])
)
print(f"Top-5 exact match rate: {correct/5:.0%}")

Vector-Based Recommendation Demo

# Simulate movie recommendations
n_movies = 5000
n_dims = 64

movie_features = np.random.random((n_movies, n_dims)).astype("float32")
movie_titles = [f"Movie {i}" for i in range(n_movies)]

index_movies = faiss.IndexFlatIP(n_dims)  # Inner product = cosine similarity
faiss.normalize_L2(movie_features)
index_movies.add(movie_features)

# Simulate user preference vector
user_pref = np.random.random((1, n_dims)).astype("float32")
faiss.normalize_L2(user_pref)

# Recommend 10 movies
D, I = index_movies.search(user_pref, 10)
print(f"\n=== Recommendations ===")
for idx, score in zip(I[0], D[0]):
    print(f"  {movie_titles[idx]} (similarity: {score:.4f})")

Index Type Selection

| Index Type | Speed | Memory | Accuracy | Best For | |:----------:|:----:|:------:|:--------:|----------| | IndexFlatL2 | Slow | High | 100% | < 100K vectors | | IndexIVFFlat | 10-100x faster | High | ~95% | < 10M vectors | | IndexIVFPQ | Very fast | Low (compressed) | ~85-90% | > 10M vectors | | IndexHNSW | Very fast | Medium | ~99% | High accuracy + large data |

Real-World Applications

1. Recommendation Systems

Netflix uses vector embeddings for each movie and user. When a user watches a movie, FAISS finds the most similar movies in vector space.

2. Semantic Search

Convert documents to vectors using embeddings, search by meaning instead of keywords. This is the foundation of RAG (Retrieval Augmented Generation).

3. Image Search

Convert images to feature vectors using neural networks. FAISS finds visually similar images in milliseconds.

4. Anomaly Detection

Encode normal behavior as vectors. Points far from all clusters are potential anomalies.

The Vibe Coding Approach

Describe what you need:

"Build a vector search API using FAISS for 1 million 256-dimension vectors. Use IVF1000 + PQ32 compressed index to keep memory under 1GB. Wrap with FastAPI as POST /search endpoint. Include add and search operations with latency monitoring."

The AI will generate the complete FAISS integration code with appropriate index selection.

Summary

FAISS is the industry standard for large-scale vector similarity search, powering recommendation systems, semantic search, and RAG applications.

Key takeaways:

Vector search finds items by semantic similarity, not keywords
Brute force (IndexFlatL2) works for up to 100K vectors
IVF indexes speed up search 10-100x with minimal accuracy loss
PQ compression reduces memory by 90%+ with slight accuracy trade-off
HNSW offers the best balance of speed, memory, and accuracy
FAISS handles billions of vectors with GPU acceleration

What's Next: Stream Processing Pipeline

The next chapter integrates Bloom Filter, HyperLogLog, Count-Min Sketch, and FAISS into a complete real-time stream processing system, from raw data to final analytics.

Performance Comparison

| Dataset Size | IndexFlatL2 | IndexIVFFlat (nprobe=10) | IndexIVFPQ | |:------------:|:-----------:|:------------------------:|:----------:| | 100K | 2 ms | 0.5 ms | 0.3 ms | | 1M | 20 ms | 2 ms | 1 ms | | 10M | 200 ms | 15 ms | 5 ms | | 100M | 2 s | 100 ms | 30 ms | | 1B | 20 s | 1 s | 200 ms |

Memory Usage Comparison

When to Use Each Index

| Scenario | Recommended Index | Why | |----------|:----------------:|-----| | Prototyping, < 100K vectors | IndexFlatL2 | 100% accurate, simple | | Production, < 10M vectors | IndexIVFFlat | Good accuracy-speed balance | | Large scale, memory-limited | IndexIVFPQ | 50x memory reduction | | Large scale, high accuracy | IndexHNSW | Best trade-off overall | | Need GPU acceleration | GpuIndexFlatL2 | 10-100x speedup on GPU |

The Vibe Coding Approach for Vector Search

"Build a semantic search engine using FAISS and sentence-transformers. Convert documents to 384-dim embeddings, index with IVF, and build a FastAPI search endpoint."

Summary

FAISS is the most widely used vector search library, powering recommendation systems, semantic search, and RAG across the industry.

FAISS: Vector Similarity Search at Scale

Why Vector Search?

What Is FAISS?

How FAISS Works

Installation

Basic Usage: Brute Force Search

IVF: Faster Search for Large Datasets

Vector-Based Recommendation Demo

Index Type Selection

Real-World Applications

1. Recommendation Systems

2. Semantic Search

3. Image Search

4. Anomaly Detection

The Vibe Coding Approach

Summary

What's Next: Stream Processing Pipeline

Performance Comparison

Memory Usage Comparison

When to Use Each Index

The Vibe Coding Approach for Vector Search

Summary

Unlock Full Tutorial