[algorithms] String Algorithms 字串演算法

cover

視覺化概覽

flowchart TD
    A["開始 KMP 匹配\n預處理 LPS 陣列"] --> B["比較 text[i] 與 pattern[j]"]
    B --> C{"text[i] == pattern[j]？"}
    C -->|"是"| D["i++, j++"]
    D --> E{"j == pattern 長度？"}
    E -->|"是"| F["找到匹配！\nj = LPS[j-1]"]
    E -->|"否"| G{"i < text 長度？"}
    C -->|"否"| H{"j > 0？"}
    H -->|"是"| I["j = LPS[j-1]\n利用失敗函數跳轉"]
    H -->|"否"| J["i++\n移到下一字元"]
    I --> B
    J --> G
    F --> G
    G -->|"是"| B
    G -->|"否"| K["匹配結束"]

字串匹配問題

在 text 中找 pattern 的所有出現位置

text:    ABABDABACDABABCABAB
pattern: ABABCABAB
結果:    位置 10

1. 暴力法 O(nm)

for (int i = 0; i <= n - m; i++) {
    int j = 0;
    while (j < m && text[i+j] == pattern[j]) j++;
    if (j == m) found(i);
}

問題：每次不匹配就從頭開始。

2. KMP 演算法 O(n+m)

核心思想：利用已匹配的資訊，不回退 text 指針

LPS 陣列

LPS = Longest Proper Prefix which is also Suffix

pattern: A B A B C A B A B
index:   0 1 2 3 4 5 6 7 8
LPS:     0 0 1 2 0 1 2 3 4

解釋:
LPS[3] = 2 → "ABAB" 的最長相等前後綴是 "AB"，長度 2
LPS[8] = 4 → "ABABCABAB" 的最長相等前後綴是 "ABAB"，長度 4

匹配過程

不匹配時，j 跳到 lps[j-1]，而不是從 0 開始

text:    ABABDABACDABABCABAB
pattern: ABABCABAB
              ↑ 不匹配

不需要重來，直接利用 LPS：
text:    ABABDABACDABABCABAB
pattern:       ABABCABAB
                 ↑ 從這裡繼續

程式碼

int[] computeLPS(String pattern) {
    int[] lps = new int[m];
    int len = 0, i = 1;
    while (i < m) {
        if (pattern[i] == pattern[len]) {
            lps[i++] = ++len;
        } else if (len > 0) {
            len = lps[len - 1];
        } else {
            lps[i++] = 0;
        }
    }
    return lps;
}

3. Rabin-Karp O(n+m) 平均

核心思想：滾動雜湊，快速比較

text:    ABCDE
         ↓
hash("ABC") = 65*256² + 66*256 + 67

滑動:
hash("BCD") = (hash("ABC") - 65*256²) * 256 + 68

只需 O(1) 計算下一個雜湊

適用：多模式匹配、抄襲檢測

4. Z Algorithm O(n)

z[i] = s[i..] 與 s 的最長公共前綴長度

s:    a a b x a a b
z:    7 1 0 0 3 1 0
      ↑ 整個字串
        ↑ 只有 "a"
            ↑ "aab" 與前綴相同

用於匹配：pattern + ”$” + text，找 z[i] == m 的位置

5. Manacher O(n) - 最長回文

問題：找最長回文子串

暴力：O(n²) 或 O(n³)

Manacher：O(n)，利用回文的對稱性

原字串:  b a b a d
插入 #:  # b # a # b # a # d #

計算每個位置的回文半徑
利用對稱性減少計算

時間複雜度對比

演算法	預處理	匹配	總計	特點
暴力	-	O(nm)	O(nm)	簡單
KMP	O(m)	O(n)	O(n+m)	不回退
Rabin-Karp	O(m)	O(n)	O(n+m)*	滾動雜湊
Z Algorithm	O(n+m)	-	O(n+m)	Z 陣列
Manacher	O(n)	-	O(n)	回文專用

*Rabin-Karp 最差 O(nm)（雜湊碰撞）

選擇指南

單次匹配：KMP
多模式匹配：Rabin-Karp 或 Aho-Corasick
回文問題：Manacher
前綴匹配：Z Algorithm

經典題目

實現 strStr() - LeetCode 28
最長回文子串 - LeetCode 5
重複的子字符串 - LeetCode 459
最短回文串 - LeetCode 214

Terry Yao's Blog

分類

目錄