🔎 解決 AI 幻覺：混合搜尋 (Hybrid Search) 實戰

在前面的章節中，我們學會了如何把文件切塊 (Chunking)，轉成向量 (Embeddings)，然後丟進向量資料庫裡做「語意搜尋 (Vector/Semantic Search)」。

剛開始玩的時候，你一定覺得這技術太神奇了！你搜尋「蘋果公司最新財報」，它能找出「Apple 2023 Q4 營收」的文件，因為 AI 懂「蘋果公司」跟「Apple」在語意上是相近的。

但是，當你把這套系統上線給真正的客戶使用時，災難就來了。

1. 純量搜尋 (Vector Search) 的致命弱點

純粹的語意搜尋，最怕遇到一種東西：「專有名詞」或「產品型號」。

假設你的資料庫裡有兩份文件：

文件 A：產品型號 XG-999-Pro 是一款高效能的電競筆電，搭載 32GB 記憶體...
文件 B：產品型號 XG-998-Lite 是一款文書筆電，輕薄好攜帶...

當客戶搜尋：「XG-999-Pro 規格」時，你猜向量搜尋會抓出哪一份？答案是：它可能兩份都覺得很像！ 甚至覺得文件 B 更像！

為什麼？因為在 AI 的腦袋 (Embedding 空間) 裡，XG-999-Pro 和 XG-998-Lite 都是「某種沒看過的英數字組合」，它們的語意非常接近。AI 根本不知道差了一個字母 Pro 差了十萬八千里。

這時候，古老但精準的 「關鍵字搜尋 (Keyword Search / BM25)」 反而是最有效的，只要字串沒對上，分數就是 0！

2. 什麼是混合搜尋 (Hybrid Search)？

既然語意搜尋懂「意思」不懂「字元」，而關鍵字搜尋懂「字元」不懂「意思」，那我們為什麼不把它們加在一起？

混合搜尋 (Hybrid Search) 的概念非常簡單：

當使用者輸入問題時，我們先跑一次「語意搜尋」，得到一份 Top 10 名單 (並給予分數，例如 0.8 分)。
同時，我們再跑一次傳統的「關鍵字搜尋 (BM25)」，得到另一份 Top 10 名單 (給予分數，例如 0.9 分)。
把這兩份名單交給一個叫做 RRF (Reciprocal Rank Fusion, 倒數排名融合) 的演算法。
RRF 會把兩邊的分數做加權計算，最後吐出一份「最強的 Top 5 名單」。

這也是目前業界最高階的 RAG 系統（如 Pinecone, Supabase Vector）的標準做法。

3. 在 Supabase 中實作 Hybrid Search

如果你剛好使用 Supabase 當作向量資料庫，那恭喜你，實作混合搜尋非常簡單，因為 PostgreSQL 原生就支援全文檢索 (Full-Text Search)。

步驟 1：建立關鍵字搜尋索引

在你的 Supabase SQL 編輯器中，為你的文件表加上一個文字搜尋的欄位：

-- 假設你的表叫做 documents
-- 新增一個 fts (Full-Text Search) 欄位
alter table documents add column fts tsvector generated always as (to_tsvector('english', content)) stored;

-- 建立索引加速查詢
create index on documents using gin (fts);

步驟 2：撰寫混合搜尋的 Stored Procedure

我們需要寫一支可以在 Supabase 內部同時執行兩種搜尋，並把分數加總的 RPC (Remote Procedure Call) 函數：

create or replace function match_documents_hybrid(
  query_embedding vector(1536), -- Open AI 的向量
  query_text text,              -- 原始關鍵字
  match_count int,              -- 要抓幾筆
  full_text_weight float default 1,  -- 關鍵字權重
  semantic_weight float default 1    -- 語意權重
)
returns table (
  id uuid,
  content text,
  similarity float
)
language plpgsql
as $$
begin
  return query
  with semantic_search as (
    -- 1. 語意搜尋 (Cosine Similarity)
    select documents.id, documents.content, 1 - (documents.embedding <=> query_embedding) as semantic_score
    from documents
    order by documents.embedding <=> query_embedding
    limit match_count * 2
  ),
  keyword_search as (
    -- 2. 關鍵字搜尋 (Full Text Search)
    select documents.id, documents.content, ts_rank(documents.fts, websearch_to_tsquery('english', query_text)) as keyword_score
    from documents
    where documents.fts @@ websearch_to_tsquery('english', query_text)
    order by keyword_score desc
    limit match_count * 2
  )
  -- 3. 分數融合 (簡單加權法)
  select 
    coalesce(semantic_search.id, keyword_search.id) as id,
    coalesce(semantic_search.content, keyword_search.content) as content,
    -- 將兩邊分數相加 (注意要做正規化，這邊為了示範簡化了公式)
    (coalesce(semantic_search.semantic_score, 0.0) * semantic_weight + 
     coalesce(keyword_search.keyword_score, 0.0) * full_text_weight) as similarity
  from semantic_search
  full outer join keyword_search on semantic_search.id = keyword_search.id
  order by similarity desc
  limit match_count;
end;
$$;

步驟 3：在 LangChain 或 Next.js 中呼叫

現在，當你的後端收到使用者的搜尋時，你只需要打這支 RPC 即可：

import { createClient } from '@supabase/supabase-js';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';

const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_ROLE_KEY!);
const embeddings = new OpenAIEmbeddings();

async function performHybridSearch(question: string) {
  // 1. 把問題轉成向量
  const queryEmbedding = await embeddings.embedQuery(question);

  // 2. 呼叫我們剛剛寫的混合搜尋 RPC
  const { data, error } = await supabase.rpc('match_documents_hybrid', {
    query_embedding: queryEmbedding, // 餵給語意搜尋
    query_text: question,            // 餵給關鍵字搜尋
    match_count: 5,                  // 最終只要前 5 名
    full_text_weight: 1.2,           // 假設你覺得關鍵字比較重要，可以調高權重
    semantic_weight: 1.0
  });

  if (error) {
    console.error("搜尋失敗:", error);
    return [];
  }

  return data;
}

4. 進階武器：Cohere Rerank

如果你覺得自己寫 SQL 來合併分數太麻煩，或是分數計算不夠準確。業界還有一種更狂的玩法叫做 Rerank (重新排序)。

流程是這樣的：

你先用一般的向量搜尋抓出前 20 份最像的文件。
你把這 20 份文件，加上使用者的原始問題，一起丟給一個專門訓練來「打分數」的 AI 模型（例如業界最強的 Cohere Rerank 模型）。
這個 AI 讀完這 20 份文件後，會針對「哪一份文件真的能回答這個問題」，重新給予一個精準度極高的排序。
你取它排出來的前 3 份，再丟給 GPT 產生最終答案。

使用 LangChain 串接 Cohere Rerank 的範例：

import { CohereRerank } from "@langchain/cohere";
import { ContextualCompressionRetriever } from "langchain/retrievers/contextual_compression";

// 1. 設定你的基礎向量檢索器 (例如 pinecone 或 supabase)
const baseRetriever = vectorStore.asRetriever(20); // 抓 20 份

// 2. 設定 Cohere 的壓縮/重排器
const compressor = new CohereRerank({
  apiKey: process.env.COHERE_API_KEY,
  model: "rerank-multilingual-v2.0", // 支援中文！
  topN: 3 // 重排後只要前 3 份
});

// 3. 把他們組裝起來成為終極檢索器
const hybridRetriever = new ContextualCompressionRetriever({
  baseCompressor: compressor,
  baseRetriever: baseRetriever,
});

// 4. 執行搜尋！
const docs = await hybridRetriever.getRelevantDocuments("XG-999-Pro 的散熱規格是什麼？");

當你把 Hybrid Search 或 Rerank 導入你的系統後，你會發現 AI 的回答準確率會瞬間從 70% 飆升到 95% 以上。客戶再也不會抱怨「AI 為什麼連明顯的型號都找錯」了！這就是區分「玩具」與「商業級產品」的關鍵技術。