[database] ERD 設計與資料庫正規化：從需求到 Schema 的完整流程

cover

先講結論

資料庫設計做錯了，後面全部都要改。這不是誇張：一個壞掉的 Schema 會讓 ORM、API、測試、報表全部連帶出問題，而且修起來的代價通常比重新設計更高。

ERD（Entity Relationship Diagram）的作用是在寫 SQL 之前，讓你用視覺化的方式把資料模型想清楚。正規化是一套判斷資料結構是否正確的規則。兩個工具配合使用，能避免大多數的設計錯誤。

ERD 基礎

ERD 由 Peter Chen 在 1976 年提出，不屬於 UML 標準（UML 有自己的 Class Diagram 也能表達類似意思，但 ERD 更接近資料庫視角）。

三個核心概念

Entity（實體）：現實世界中的「東西」。可以被資料化的物件。

使用者、訂單、商品、付款、分類

Attribute（屬性）：Entity 的特徵。對應資料表的欄位。

使用者 → id, email, name, created_at
訂單 → id, user_id, status, total, created_at

Relationship（關聯）：Entity 之間的關係。

使用者「下」訂單
訂單「包含」訂單明細
商品「屬於」分類

關聯的多重性（Cardinality）

記法	意義
`1`	恰好一個
`0..1`	零或一個（可選的一對一）
`0..` 或 ``	零或多個
`1..*`	一或多個（至少一個）

Mermaid ER Diagram（推薦）

適合放進 GitHub README、Notion、文件系統，版本控制友善。

erDiagram
    USERS {
        int id PK
        varchar email UK "唯一值"
        varchar name
        varchar password "bcrypt hash"
        enum status
        timestamp created_at
    }

    ORDERS {
        int id PK
        int user_id FK
        enum status
        decimal total
        text note
        timestamp created_at
        timestamp updated_at
    }

    ORDER_ITEMS {
        int id PK
        int order_id FK
        int product_id FK
        int quantity
        decimal unit_price "下單當下的價格"
    }

    PRODUCTS {
        int id PK
        int category_id FK
        varchar name
        text description
        decimal price
        enum status
        timestamp created_at
    }

    CATEGORIES {
        int id PK
        int parent_id FK "null = 頂層分類"
        varchar name
        varchar slug
        int sort_order
    }

    PAYMENTS {
        int id PK
        int order_id FK
        enum method
        enum status
        decimal amount
        varchar transaction_id "第三方金流編號"
        timestamp paid_at
    }

    USERS ||--o{ ORDERS : "下"
    ORDERS ||--|{ ORDER_ITEMS : "包含"
    PRODUCTS ||--o{ ORDER_ITEMS : "被購買"
    CATEGORIES ||--o{ PRODUCTS : "分類"
    CATEGORIES ||--o{ CATEGORIES : "子分類"
    ORDERS ||--|| PAYMENTS : "對應"

Mermaid 關聯符號

符號	意義
`\|\|--\|\|`	一對一（One-to-One）
`\|\|--o{`	一對多（One-to-Many）
`}o--o{`	多對多（Many-to-Many）
`\|\|--\|{`	一對一個以上（至少一個）

DBML 語法（適用 dbdiagram.io）

DBML 是 dbdiagram.io 的語法，可以匯出 SQL DDL，適合正式設計文件。

Table users {
  id          int         [pk, increment]
  email       varchar(255) [unique, not null]
  name        varchar(100) [not null]
  password    varchar(255) [not null, note: 'bcrypt hash']
  status      enum('active', 'inactive', 'banned') [default: 'active']
  created_at  timestamp   [default: `now()`]
  updated_at  timestamp
}
 
Table categories {
  id          int         [pk, increment]
  parent_id   int         [ref: > categories.id, note: 'null = 頂層分類']
  name        varchar(100) [not null]
  slug        varchar(100) [unique]
  sort_order  int         [default: 0]
}
 
Table products {
  id          int         [pk, increment]
  category_id int         [ref: > categories.id, not null]
  name        varchar(255) [not null]
  description text
  price       decimal(10,2) [not null]
  status      enum('active', 'inactive', 'out_of_stock') [default: 'active']
  created_at  timestamp   [default: `now()`]
  updated_at  timestamp
}
 
Table inventory {
  id          int         [pk, increment]
  product_id  int         [ref: - products.id, unique]
  quantity    int         [not null, default: 0]
  reserved    int         [not null, default: 0, note: '已被訂單保留的數量']
  updated_at  timestamp
}
 
Table orders {
  id          int         [pk, increment]
  user_id     int         [ref: > users.id, not null]
  status      enum('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')
  total       decimal(10,2) [not null]
  note        text
  created_at  timestamp   [default: `now()`]
  updated_at  timestamp
}
 
Table order_items {
  id          int         [pk, increment]
  order_id    int         [ref: > orders.id, not null]
  product_id  int         [ref: > products.id, not null]
  quantity    int         [not null]
  unit_price  decimal(10,2) [not null, note: '下單當下的價格，不跟商品同步']
}
 
Table payments {
  id              int         [pk, increment]
  order_id        int         [ref: - orders.id, unique]
  method          enum('credit_card', 'transfer', 'cod')
  status          enum('pending', 'success', 'failed', 'refunded')
  amount          decimal(10,2)
  transaction_id  varchar(255) [note: '第三方金流的交易編號']
  paid_at         timestamp
}

資料庫正規化

正規化（Normalization）是一套消除資料冗餘（redundancy）、防止異常（anomaly）的規則。

為什麼要正規化？ 不正規化的資料表會出現三種異常：

插入異常：要新增一筆資料，卻必須同時新增不相關的資料
更新異常：同一個值存在多個地方，改一個漏了其他的
刪除異常：刪一筆資料，連帶刪除了還有用的資訊

第一正規化（1NF）

規則：每個欄位只存一個值（原子性），不存多值或重複群組。

❌ 違反 1NF：
orders 表有 product1, product2, product3 欄位
→ 重複群組，要加新商品就要改 Schema

❌ 違反 1NF：
tags 欄位存 "python,javascript,typescript"（用逗號分隔）
→ 多值欄位，查詢、更新都困難

✓ 1NF 做法：
orders + order_items 拆成兩張表（一對多關聯）
tags 拆成獨立的 tags 表（多對多關聯）

第二正規化（2NF）

規則：在 1NF 的基礎上，每個非主鍵欄位必須完全依賴整個主鍵（不能只依賴主鍵的一部分）。

只有複合主鍵才可能違反 2NF。

❌ 違反 2NF：
order_items 表：(order_id, product_id) 複合主鍵
  product_name 只依賴 product_id，不依賴 order_id
→ product_name 應該放在 products 表，不是 order_items

✓ 2NF 做法：
order_items：order_id, product_id, quantity, unit_price
products：id, name, price, ...

注意：unit_price 記錄下單當下的價格（不跟著 products.price 改變），
這是刻意 denormalize，必須加 comment 說明原因。

第三正規化（3NF）

規則：在 2NF 的基礎上，非主鍵欄位不能依賴另一個非主鍵欄位（消除傳遞依賴）。

❌ 違反 3NF：
orders 表有 user_id, user_email, user_name
  user_email 和 user_name 依賴 user_id（非主鍵）
  → 更新使用者 email 時要改 orders 的每一筆

✓ 3NF 做法：
orders 只存 user_id（外鍵）
要顯示使用者資訊時 JOIN users 表

實用原則：正規化到 3NF 就夠了

更高的正規化（BCNF、4NF、5NF）在學術上有意義，但實際專案開發到 3NF 通常已經足夠。更高的正規化往往帶來更多 JOIN，反而影響查詢效能。

刻意 denormalize 的情況（合理的例外）：

order_items.unit_price — 記錄下單當下的快照價格，不跟商品同步（審計需求）
報表用的彙總表（aggregation table）— 為查詢效能犧牲正規化
跨服務資料複製（微服務架構）— 避免跨服務 JOIN

刻意 denormalize 的欄位一定要加 comment 說明原因。

Schema 命名規範

資料表名稱：小寫複數，snake_case
  ✓ users, order_items, product_categories
  ✗ User, OrderItem, productCategory

主鍵：id（int, auto increment 或 bigint）
外鍵：{參照表單數}_id
  ✓ user_id, product_id, category_id

時間欄位：
  created_at  → 建立時間（不更新）
  updated_at  → 最後更新時間（每次更新自動更新）
  deleted_at  → 軟刪除時間（nullable，null = 未刪除）

布林欄位：is_ 或 has_ 前綴
  ✓ is_active, is_verified, has_subscription

金額欄位：
  ✓ decimal(10,2)  → 精確小數
  ✗ float / double  → 浮點數不精確，不能用在金額

列舉欄位：
  值域固定、不常新增 → enum
  值域會增加、有管理需求 → 獨立 lookup table

Schema 設計決策清單

交出 ERD 或 Schema 前確認：

每個欄位只存一個值（1NF）
沒有欄位只依賴複合主鍵的一部分（2NF）
沒有非主鍵欄位依賴另一個非主鍵欄位（3NF）
刻意 denormalize 的欄位有 comment 說明原因
所有外鍵都建了 Index
金額欄位用 decimal，不用 float
時間欄位確認時區（建議以 UTC 儲存）
軟刪除欄位（deleted_at）的 unique constraint 有考慮（可能需要 partial index）
大量查詢的欄位（email、status、created_at）有 Index 計畫

工具選擇

工具	特點
Mermaid ER Diagram	文字語法、GitHub/Notion 直接渲染、版本控制友善
dbdiagram.io（DBML）	視覺化、可匯出 SQL DDL、免費版夠用
draw.io	最彈性、可存 XML、適合複雜圖形
Prisma Schema	現代 Node.js 專案直接用 schema 當文件

Terry Yao's Blog

目錄

ERD 設計與資料庫正規化：從需求到 Schema 的完整流程