[B06][57] Search 整合：全文搜尋 / Elasticsearch / MeiliSearch

為什麼 DB LIKE 不夠

-- LIKE 的問題
SELECT * FROM products WHERE name LIKE '%wireless%headphone%';

問題：

效能：沒辦法用 index（% 開頭的 LIKE 一定是全表掃描）
語言：中文斷詞要另外處理，LIKE '%無線%' 找不到「無線耳機」裡的「無」
相關性：沒有排序——name LIKE '%apple%' 回傳的順序是 DB 決定的，不是「最相關的排前面」
容錯：用戶打 wireles headphon（拼錯），LIKE 什麼都找不到

全文搜尋引擎用**倒排索引（Inverted Index）**解這些問題：先把文字拆成 token，建立 token → 文件的映射，搜尋時直接查 token，不掃全表。

工具選型

工具	適合	特點
Elasticsearch	大量資料、複雜查詢、log 分析	功能最強，配置複雜，資源要求高
MeiliSearch	中小型、開發者友善、即時搜尋	設定簡單，開箱即用，Rust 寫的效能好
Typesense	類 MeiliSearch	強型別 schema，hosted 選項
PostgreSQL FTS	不想再加服務、資料量中等	內建，不用另外維護，但功能有限
Algolia	不想自己維護	SaaS，貴，文件很好

選型原則：先用 PostgreSQL FTS 撐，撐不住再換 MeiliSearch；資料量到百萬級或有複雜 aggregation 才考慮 Elasticsearch。

PostgreSQL 全文搜尋（最簡單的起點）

-- 建立全文搜尋 index
ALTER TABLE products ADD COLUMN search_vector tsvector;
 
-- 更新 search_vector（中文需要安裝 pg_jieba 或 zhparser）
UPDATE products
SET search_vector =
  setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
  setweight(to_tsvector('english', coalesce(description, '')), 'B');
 
-- 建立 GIN index（全文搜尋用 GIN）
CREATE INDEX idx_products_search ON products USING GIN(search_vector);
 
-- 觸發器：資料更新時自動更新 search_vector
CREATE TRIGGER update_search_vector
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION
  tsvector_update_trigger(search_vector, 'pg_catalog.english', name, description);

// 搜尋
async function searchProducts(query: string) {
  return db.query(`
    SELECT *,
      ts_rank(search_vector, plainto_tsquery('english', $1)) AS rank
    FROM products
    WHERE search_vector @@ plainto_tsquery('english', $1)
    ORDER BY rank DESC
    LIMIT 20
  `, [query]);
}

限制：中文支援差（需要額外套件）、沒有 typo tolerance、aggregation 能力弱。

MeiliSearch 整合

import { MeiliSearch } from 'meilisearch';
 
const client = new MeiliSearch({
  host: process.env.MEILISEARCH_HOST,
  apiKey: process.env.MEILISEARCH_API_KEY,
});
 
// 設定 index
const productIndex = client.index('products');
 
await productIndex.updateSettings({
  searchableAttributes: ['name', 'description', 'brand'],  // 哪些欄位被搜尋
  filterableAttributes: ['category', 'status', 'price'],   // 可以 filter 的欄位
  sortableAttributes: ['price', 'createdAt', 'popularity'],
  rankingRules: [                         // 相關性排序規則
    'words',
    'typo',
    'proximity',
    'attribute',
    'sort',
    'exactness',
  ],
  typoTolerance: {
    enabled: true,
    minWordSizeForTypos: { oneTypo: 5, twoTypos: 8 },
  },
});
 
// 搜尋
async function searchProducts(query: string, filters: SearchFilters) {
  const result = await productIndex.search(query, {
    filter: buildFilter(filters),  // 'category = "electronics" AND price < 5000'
    sort: filters.sort ? [filters.sort] : undefined,
    limit: filters.limit ?? 20,
    offset: (filters.page - 1) * (filters.limit ?? 20),
    attributesToHighlight: ['name', 'description'],  // 高亮匹配的文字
    highlightPreTag: '<mark>',
    highlightPostTag: '</mark>',
  });
 
  return {
    hits: result.hits,
    total: result.estimatedTotalHits,
    processingTime: result.processingTimeMs,
  };
}

最關鍵的問題：資料同步

Search index 不是 DB，它是一個副本。怎麼讓 index 和 DB 保持同步是設計的核心。

方案一：Write-Through（寫入時同時更新）

// Service 層：每次 DB 操作後更新 index
class ProductService {
  async create(dto: CreateProductDto) {
    const product = await productRepo.create(dto);
 
    // 同步寫入 index
    await productIndex.addDocuments([toIndexDocument(product)]);
 
    return product;
  }
 
  async update(id: string, dto: UpdateProductDto) {
    const product = await productRepo.update(id, dto);
    await productIndex.updateDocuments([toIndexDocument(product)]);
    return product;
  }
 
  async delete(id: string) {
    await productRepo.softDelete(id);
    await productIndex.deleteDocument(id);
  }
}

優點：簡單、低延遲。缺點：search index 更新失敗怎麼辦？兩個寫入沒有 transaction 保護。

方案二：Event-Driven（異步更新，推薦）

// DB 操作完成後發事件，Queue 異步更新 index
class ProductService {
  async create(dto: CreateProductDto) {
    const product = await productRepo.create(dto);
    // 不等 index 更新，發事件後立刻回傳
    await searchSyncQueue.add('sync-product', { id: product.id, action: 'upsert' });
    return product;
  }
}
 
// Worker 負責 index 同步（有 retry）
const searchSyncWorker = new Worker('sync-product', async (job) => {
  const { id, action } = job.data;
 
  if (action === 'delete') {
    await productIndex.deleteDocument(id);
    return;
  }
 
  // 從 DB 取最新資料
  const product = await productRepo.findById(id);
  if (!product || product.deletedAt) {
    await productIndex.deleteDocument(id);
    return;
  }
 
  await productIndex.addDocuments([toIndexDocument(product)]);
}, {
  connection: redis,
  defaultJobOptions: { attempts: 5, backoff: { type: 'exponential', delay: 1000 } },
});

優點：DB 操作和 index 更新解耦，index 失敗不影響主流程，有 retry。缺點：搜尋結果有秒級延遲（通常可以接受）。

方案三：CDC（Change Data Capture）

監聽 DB 的 binlog / WAL，資料有變動就觸發 index 更新。適合大規模系統（使用 Debezium + Kafka）。一般應用過度設計。

初始建立 Index（Bulk Index）

新系統上線或 schema 改了需要重建 index：

async function rebuildIndex() {
  // 1. 建立新 index（別名機制，不影響現有查詢）
  const newIndexName = `products_${Date.now()}`;
  const newIndex = client.index(newIndexName);
 
  await newIndex.updateSettings(indexSettings);
 
  // 2. 分批把所有資料 bulk index
  const batchSize = 1000;
  let offset = 0;
 
  while (true) {
    const products = await productRepo.findAll({ offset, limit: batchSize });
    if (products.length === 0) break;
 
    await newIndex.addDocuments(products.map(toIndexDocument));
    offset += products.length;
 
    logger.info(`Indexed ${offset} products`);
  }
 
  // 3. 切換別名（atomic，搜尋不中斷）
  await client.swapIndexes([{ indexes: ['products', newIndexName] }]);
 
  // 4. 刪掉舊 index
  await client.deleteIndex(newIndexName);
 
  logger.info('Index rebuild completed');
}

Index Document 設計

搜尋 index 不需要把 DB 的所有欄位都放進去：

// 把 DB entity 轉成適合搜尋的 document
function toIndexDocument(product: Product) {
  return {
    id: product.id,
    name: product.name,
    description: product.description,
    brand: product.brand?.name,         // flatten 關聯資料
    category: product.category,
    tags: product.tags.map(t => t.name),
    price: product.price,
    status: product.status,
    popularity: product.orderCount,     // 用來排序的指標
    // 不放：password、內部欄位、可以 join 回來的東西
  };
}

Terry Yao's Blog

目錄

[57] Search 整合：全文搜尋 / Elasticsearch / MeiliSearch