[B06][46] Health Check 設計：/health / /ready / /live 的差異

三個不同的問題

K8s 的 probe 問的是三個不同的問題：

Probe	問題	失敗時的處置
Liveness	這個 container 還活著嗎？	殺掉重啟（restart）
Readiness	這個 pod 準備好接收流量了嗎？	從 Service（load balancer）移除
Startup	應用啟動完成了嗎？（啟動慢的服務用）	在 startup 期間不檢查 liveness

三個 Endpoint 分開設計

`/health/live`（Liveness Probe）

回答「我的 process 還正常運作嗎」。這個 probe 失敗 → K8s 重啟 container。

只檢查應用本身，不檢查外部依賴：

// ✅ Liveness：只看 process 自己
app.get('/health/live', (req, res) => {
  res.status(200).json({
    status: 'alive',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
  });
});

為什麼不檢查 DB：如果 DB 暫時連不上，你不應該讓 K8s 重啟 container（重啟也解決不了 DB 的問題，反而造成重啟 loop）。Liveness 只問「應用有沒有 crash、有沒有 deadlock、event loop 有沒有卡死」。

`/health/ready`（Readiness Probe）

回答「我準備好接收流量了嗎」。失敗 → K8s 把這個 pod 從 Service endpoint 移除（不再轉流量進來），但 container 不重啟。

要檢查外部依賴：

// ✅ Readiness：檢查關鍵依賴
app.get('/health/ready', async (req, res) => {
  const checks: Record<string, { status: 'ok' | 'error'; latency?: number }> = {};
 
  // 檢查 DB
  try {
    const start = Date.now();
    await sequelize.query('SELECT 1');
    checks.database = { status: 'ok', latency: Date.now() - start };
  } catch (error) {
    checks.database = { status: 'error' };
  }
 
  // 檢查 Redis
  try {
    const start = Date.now();
    await redis.ping();
    checks.redis = { status: 'ok', latency: Date.now() - start };
  } catch (error) {
    checks.redis = { status: 'error' };
  }
 
  const allHealthy = Object.values(checks).every(c => c.status === 'ok');
 
  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? 'ready' : 'not-ready',
    checks,
    timestamp: new Date().toISOString(),
  });
});

Readiness 的實際用途：

Rolling update 時：新 pod 啟動，DB connection pool 建立完成前，readiness 回 503 → K8s 不轉流量到這個 pod，等 ready 了才轉
Migration 跑完前：如果 migration 跑在 app 啟動時（不建議，見 migration 策略），migration 期間 readiness 可以回 503
DB 斷線時：DB 暫時連不上，readiness 失敗 → 這個 pod 不收流量，但不重啟；DB 恢復後 readiness 自動通過，流量回來

`/health/startup`（Startup Probe）

專為啟動慢的服務設計（Spring Boot、JVM 服務）。在 startup probe 成功前，K8s 不跑 liveness probe——避免啟動中的服務被 liveness 誤判為死掉而重啟。

# K8s Deployment 設定
startupProbe:
  httpGet:
    path: /health/ready
    port: 3000
  failureThreshold: 30    # 30 次失敗（30 * 10s = 5 分鐘）才算啟動失敗
  periodSeconds: 10
livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  periodSeconds: 10
  failureThreshold: 3     # 3 次失敗才重啟
readinessProbe:
  httpGet:
    path: /health/ready
    port: 3000
  periodSeconds: 5
  failureThreshold: 3

Response 格式

// /health/ready 的完整 response
{
  "status": "ready",
  "version": "1.2.3",
  "environment": "production",
  "uptime": 3600,
  "timestamp": "2026-04-22T10:30:00.000Z",
  "checks": {
    "database": { "status": "ok", "latency": 3 },
    "redis": { "status": "ok", "latency": 1 }
  }
}
 
// 不健康時
{
  "status": "not-ready",
  "checks": {
    "database": { "status": "error", "error": "Connection timeout" },
    "redis": { "status": "ok", "latency": 1 }
  }
}

不要在 health check response 裡放敏感資訊：DB 的完整 connection string、stack trace、內部 IP 都不要出現在 response 裡。

`/health`（簡化版，給監控系統用）

很多監控系統（Uptime Robot、Pingdom）只做 HTTP GET + 看 status code，不做複雜判斷。可以提供一個簡化的 /health endpoint：

// 給簡單 uptime monitor 用
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'ok' });
});

這個 endpoint 不用於 K8s probe——K8s probe 用 /health/live 和 /health/ready。

各框架的 Health Check 整合

框架	工具	特點
Express	自建（如上）	最靈活
FastAPI	`/health` 路由 + Depends 健康檢查	簡單直接
NestJS	`@nestjs/terminus`	內建 DB / Redis / memory 健康檢查
Spring Boot	Spring Actuator `/actuator/health`	自動偵測所有 bean 的健康狀態
Laravel	`laravel-health` 套件	豐富的檢查項目

Spring Actuator 的 /actuator/health 是最完整的開箱即用方案——它自動檢查所有 DataSource、Redis、Kafka 的健康狀態，並把 liveness 和 readiness 分開。

Terry Yao's Blog

目錄

[46] Health Check 設計：/health / /ready / /live 的差異

三個不同的問題

三個 Endpoint 分開設計

`/health/live`（Liveness Probe）

`/health/ready`（Readiness Probe）

`/health/startup`（Startup Probe）

Response 格式

`/health`（簡化版，給監控系統用）

各框架的 Health Check 整合

延伸閱讀

關係圖譜

反向連結

Terry Yao's Blog

目錄

三個不同的問題

三個 Endpoint 分開設計

/health/live（Liveness Probe）

/health/ready（Readiness Probe）

/health/startup（Startup Probe）

Response 格式

/health（簡化版，給監控系統用）

各框架的 Health Check 整合

延伸閱讀

關係圖譜

反向連結

`/health/live`（Liveness Probe）

`/health/ready`（Readiness Probe）

`/health/startup`（Startup Probe）

`/health`（簡化版，給監控系統用）