๐ Solving AI Hallucinations: A Practical Guide to Hybrid Search
In previous chapters, we learned how to split documents into chunks, convert them into embeddings, and store them in a vector database for "semantic search."
At first, this technology feels magical!
When you search for "Apple's latest financial report," it retrieves documents about "Apple 2023 Q4 revenue" because the AI understands that "Apple" and "Apple Inc." are semantically similar.
But when you deploy this system to real users, disaster strikes.
1. The Fatal Flaw of Pure Vector Search
Pure semantic search struggles with one thing: proper nouns or product codes.
Imagine your database contains two documents:
- Document A:
Product model XG-999-Pro is a high-performance gaming laptop with 32GB RAM... - Document B:
Product model XG-998-Lite is an ultraportable laptop for office use...
When a user searches for "XG-999-Pro specifications," which document will vector search retrieve?
The answer: It might think both are similar! Or even rank Document B higher!
Why? Because in the AI's "mind" (embedding space), XG-999-Pro and XG-998-Lite are just "some alphanumeric combinations it hasn't seen before"โtheir semantics are nearly identical. The AI doesn't realize that a single letter (Pro) makes a world of difference.
Here, the old but precise keyword search (BM25) becomes the most effectiveโif the strings don't match, the score is 0!
2. What is Hybrid Search?
Since semantic search understands "meaning" but not "characters," and keyword search understands "characters" but not "meaning," why not combine them?
Hybrid Search is simple:
- When a user submits a query, run a "semantic search" first, yielding a Top 10 list (with scores, e.g., 0.8).
- Simultaneously, run a traditional "keyword search (BM25)," producing another Top 10 list (with scores, e.g., 0.9).
- Feed both lists to an algorithm called RRF (Reciprocal Rank Fusion).
- RRF performs weighted scoring and outputs the "strongest Top 5 list."
This is the standard approach in high-end RAG systems (e.g., Pinecone, Supabase Vector).
3. Implementing Hybrid Search in Supabase
If you're using Supabase as your vector database, you're in luckโHybrid Search is straightforward because PostgreSQL natively supports full-text search.
Step 1: Create a Keyword Search Index
In your Supabase SQL editor, add a full-text search column to your documents table:
-- Assuming your table is named 'documents'
-- Add an fts (Full-Text Search) column
alter table documents add column fts tsvector generated always as (to_tsvector('english', content)) stored;
-- Create an index for faster queries
create index on documents using gin (fts);
Step 2: Write a Hybrid Search Stored Procedure
Create an RPC function to execute both searches and merge the scores:
create or replace function match_documents_hybrid(
query_embedding vector(1536), -- OpenAI's vector
query_text text, -- Raw keywords
match_count int, -- Number of results
full_text_weight float default 1, -- Keyword weight
semantic_weight float default 1 -- Semantic weight
)
returns table (
id uuid,
content text,
similarity float
)
language plpgsql
as $$
begin
return query
with semantic_search as (
-- 1. Semantic Search (Cosine Similarity)
select documents.id, documents.content, 1 - (documents.embedding <=> query_embedding) as semantic_score
from documents
order by documents.embedding <=> query_embedding
limit match_count * 2
),
keyword_search as (
-- 2. Keyword Search (Full Text Search)
select documents.id, documents.content, ts_rank(documents.fts, websearch_to_tsquery('english', query_text)) as keyword_score
from documents
where documents.fts @@ websearch_to_tsquery('english', query_text)
order by keyword_score desc
limit match_count * 2
)
-- 3. Score Fusion (Simple Weighted Sum)
select
coalesce(semantic_search.id, keyword_search.id) as id,
coalesce(semantic_search.content, keyword_search.content) as content,
-- Combine scores (normalization simplified for demonstration)
(coalesce(semantic_search.semantic_score, 0.0) * semantic_weight +
coalesce(keyword_search.keyword_score, 0.0) * full_text_weight) as similarity
from semantic_search
full outer join keyword_search on semantic_search.id = keyword_search.id
order by similarity desc
limit match_count;
end;
$$;
Step 3: Call the Function from LangChain or Next.js
Now, when your backend receives a query, simply call this RPC:
import { createClient } from '@supabase/supabase-js';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_ROLE_KEY!);
const embeddings = new OpenAIEmbeddings();
async function performHybridSearch(question: string) {
// 1. Convert the question to a vector
const queryEmbedding = await embeddings.embedQuery(question);
// 2. Call the Hybrid Search RPC
const { data, error } = await supabase.rpc('match_documents_hybrid', {
query_embedding: queryEmbedding, // For semantic search
query_text: question, // For keyword search
match_count: 5, // Top 5 results
full_text_weight: 1.2, // Adjust weights (e.g., prioritize keywords)
semantic_weight: 1.0
});
if (error) {
console.error("Search failed:", error);
return [];
}
return data;
}
4. Advanced Weapon: Cohere Rerank
If writing SQL to merge scores feels cumbersome or inaccurate, the industry offers a more powerful tool: Rerank.
The workflow:
- Use vector search to retrieve the top 20 most relevant documents.
- Send these 20 documents, along with the user's original question, to a specialized AI model (e.g., Cohere Rerank).
- The AI reads all 20 documents and reranks them based on "which document truly answers the question."
- Take the top 3 reranked documents and feed them to GPT for the final answer.
Example with LangChain and Cohere Rerank:
import { CohereRerank } from "@langchain/cohere";
import { ContextualCompressionRetriever } from "langchain/retrievers/contextual_compression";
// 1. Set up your base vector retriever (e.g., Pinecone or Supabase)
const baseRetriever = vectorStore.asRetriever(20); // Fetch 20 docs
// 2. Configure Cohere's reranker
const compressor = new CohereRerank({
apiKey: process.env.COHERE_API_KEY,
model: "rerank-multilingual-v2.0", // Supports Chinese!
topN: 3 // Keep only the top 3 after reranking
});
// 3. Combine them into the ultimate retriever
const hybridRetriever = new ContextualCompressionRetriever({
baseCompressor: compressor,
baseRetriever: baseRetriever,
});
// 4. Execute the search!
const docs = await hybridRetriever.getRelevantDocuments("What are the cooling specs of XG-999-Pro?");
After integrating Hybrid Search or Rerank, you'll see AI accuracy jump from 70% to over 95%.
Clients will no longer complain about "the AI failing on obvious product codes." This is the technology that separates "toys" from "commercial-grade products."