FAISS: Vector Similarity Search at Scale
How does Netflix know you might like Squid Game? How does Amazon know you need printer paper after buying a printer? The answer is vector similarity search, and FAISS is the engine that powers it at scale.
Why Vector Search?
Traditional search relies on keywords and metadata. But understanding semantic similarity requires representing items as vectors:
Movie A (Action): [0.9, 0.1, 0.8, 0.2] <- Action-heavy
Movie B (Action): [0.8, 0.2, 0.7, 0.3] <- Similar to A
Movie C (Drama): [0.1, 0.9, 0.2, 0.8] <- Different from A
Vector search finds items where the vector distance is smallest. This powers:
- Recommendation systems (Netflix, Amazon, Spotify)
- Semantic search (find by meaning, not keywords)
- RAG (Retrieval Augmented Generation)
- Image and audio similarity search
What Is FAISS?
FAISS (Facebook AI Similarity Search) is an open-source library developed by Meta for efficient similarity search and clustering of dense vectors.
Features:
- Handles billions of vectors
- GPU-accelerated
- Multiple index types for different speed/accuracy trade-offs
- Python and C++ interfaces
How FAISS Works
Installation
pip install faiss-cpu # CPU version
pip install faiss-gpu # GPU version (CUDA required)
Basic Usage: Brute Force Search
For small datasets (< 100K vectors), brute force L2 distance works fine:
import numpy as np
import faiss
import time
# Generate test data: 10,000 vectors of 128 dimensions
np.random.seed(42)
d = 128 # vector dimension
n = 10000 # number of vectors
xb = np.random.random((n, d)).astype("float32")
# Create index (brute force L2 distance)
index = faiss.IndexFlatL2(d)
print(f"Index trained: {index.is_trained}")
# Add vectors
index.add(xb)
print(f"Index size: {index.ntotal}")
# Search: find 5 nearest neighbors for 5 query vectors
xq = np.random.random((5, d)).astype("float32")
start = time.time()
k = 5
distances, indices = index.search(xq, k)
elapsed = time.time() - start
print(f"\n=== Brute Force Search ===")
print(f"Time: {elapsed*1000:.2f} ms")
for i in range(5):
print(f"Query {i}: nearest = {indices[i]}, distance = {distances[i]}")
IVF: Faster Search for Large Datasets
For datasets over 1 million vectors, use IVF (Inverted File) indexing:
# Create IVF index with 100 centroids
nlist = 100
quantizer = faiss.IndexFlatL2(d)
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)
# IVF needs training
index_ivf.train(xb)
index_ivf.add(xb)
# nprobe controls how many clusters to explore
index_ivf.nprobe = 10
start = time.time()
distances_ivf, indices_ivf = index_ivf.search(xq, k)
elapsed_ivf = time.time() - start
print(f"\n=== IVF Search Results ===")
print(f"Time: {elapsed_ivf*1000:.2f} ms")
# Compare accuracy with brute force
correct = sum(
1 for i in range(5)
if set(indices[i]) == set(indices_ivf[i])
)
print(f"Top-5 exact match rate: {correct/5:.0%}")
Vector-Based Recommendation Demo
# Simulate movie recommendations
n_movies = 5000
n_dims = 64
movie_features = np.random.random((n_movies, n_dims)).astype("float32")
movie_titles = [f"Movie {i}" for i in range(n_movies)]
index_movies = faiss.IndexFlatIP(n_dims) # Inner product = cosine similarity
faiss.normalize_L2(movie_features)
index_movies.add(movie_features)
# Simulate user preference vector
user_pref = np.random.random((1, n_dims)).astype("float32")
faiss.normalize_L2(user_pref)
# Recommend 10 movies
D, I = index_movies.search(user_pref, 10)
print(f"\n=== Recommendations ===")
for idx, score in zip(I[0], D[0]):
print(f" {movie_titles[idx]} (similarity: {score:.4f})")
Index Type Selection
| Index Type | Speed | Memory | Accuracy | Best For | |:----------:|:----:|:------:|:--------:|----------| | IndexFlatL2 | Slow | High | 100% | < 100K vectors | | IndexIVFFlat | 10-100x faster | High | ~95% | < 10M vectors | | IndexIVFPQ | Very fast | Low (compressed) | ~85-90% | > 10M vectors | | IndexHNSW | Very fast | Medium | ~99% | High accuracy + large data |
Real-World Applications
1. Recommendation Systems
Netflix uses vector embeddings for each movie and user. When a user watches a movie, FAISS finds the most similar movies in vector space.
2. Semantic Search
Convert documents to vectors using embeddings, search by meaning instead of keywords. This is the foundation of RAG (Retrieval Augmented Generation).
3. Image Search
Convert images to feature vectors using neural networks. FAISS finds visually similar images in milliseconds.
4. Anomaly Detection
Encode normal behavior as vectors. Points far from all clusters are potential anomalies.
The Vibe Coding Approach
Describe what you need:
"Build a vector search API using FAISS for 1 million 256-dimension vectors. Use IVF1000 + PQ32 compressed index to keep memory under 1GB. Wrap with FastAPI as POST /search endpoint. Include add and search operations with latency monitoring."
The AI will generate the complete FAISS integration code with appropriate index selection.
Summary
FAISS is the industry standard for large-scale vector similarity search, powering recommendation systems, semantic search, and RAG applications.
Key takeaways:
- Vector search finds items by semantic similarity, not keywords
- Brute force (IndexFlatL2) works for up to 100K vectors
- IVF indexes speed up search 10-100x with minimal accuracy loss
- PQ compression reduces memory by 90%+ with slight accuracy trade-off
- HNSW offers the best balance of speed, memory, and accuracy
- FAISS handles billions of vectors with GPU acceleration
What's Next: Stream Processing Pipeline
The next chapter integrates Bloom Filter, HyperLogLog, Count-Min Sketch, and FAISS into a complete real-time stream processing system, from raw data to final analytics.
Performance Comparison
| Dataset Size | IndexFlatL2 | IndexIVFFlat (nprobe=10) | IndexIVFPQ | |:------------:|:-----------:|:------------------------:|:----------:| | 100K | 2 ms | 0.5 ms | 0.3 ms | | 1M | 20 ms | 2 ms | 1 ms | | 10M | 200 ms | 15 ms | 5 ms | | 100M | 2 s | 100 ms | 30 ms | | 1B | 20 s | 1 s | 200 ms |
Memory Usage Comparison
| Index Type | 1M vectors, 128-dim | Memory | |:----------:|:-------------------:|:------:| | IndexFlatL2 | Raw vectors | 512 MB | | IndexIVFFlat (nlist=1000) | Vectors + centroids | ~515 MB | | IndexIVFPQ (m=16) | Compressed | ~10 MB | | IndexHNSW | Vectors + graph edges | ~600 MB |
When to Use Each Index
| Scenario | Recommended Index | Why | |----------|:----------------:|-----| | Prototyping, < 100K vectors | IndexFlatL2 | 100% accurate, simple | | Production, < 10M vectors | IndexIVFFlat | Good accuracy-speed balance | | Large scale, memory-limited | IndexIVFPQ | 50x memory reduction | | Large scale, high accuracy | IndexHNSW | Best trade-off overall | | Need GPU acceleration | GpuIndexFlatL2 | 10-100x speedup on GPU |
The Vibe Coding Approach for Vector Search
"Build a semantic search engine using FAISS and sentence-transformers. Convert documents to 384-dim embeddings, index with IVF, and build a FastAPI search endpoint."
Summary
FAISS is the most widely used vector search library, powering recommendation systems, semantic search, and RAG across the industry.