KMP

2022-11-30 23:27:41 +08:00
parent 72555b19e0
commit 76c26bb4c0
5 changed files with 134 additions and 11 deletions
--- a/include/s0028_find_the_index_of_the_first_occurrence_in_a_string.hpp
+++ b/include/s0028_find_the_index_of_the_first_occurrence_in_a_string.hpp
@@ -7,6 +7,7 @@ using namespace std;
 class S0028 {
 public:
  void getNext(int* next, const string& s);
  int strStr(string haystack, string needle);
 };
--- a/notes/src/SUMMARY.md
+++ b/notes/src/SUMMARY.md
@@ -8,6 +8,8 @@
 - [替换空格](./substitute_spaces.md)
 - [翻转字符串里的单词](./reverse_words_in_a_string.md)
 - [左旋转字符串](./reverse_left_words.md)
 - [KMP](./kmp.md)
 # 经典代码
--- a/notes/src/kmp.md
+++ b/notes/src/kmp.md
@@ -0,0 +1,35 @@
 # KMP
 [代码随想录](https://programmercarl.com/0028.%E5%AE%9E%E7%8E%B0strStr.html)
 KMP 主要用在 pattern 匹配上。
 比如给出一个字符串 s 和一个 pattern ，请找出 pattern 第一次在 s 中出现的下标。
 ## 最长公共前后缀
 前缀是指不包含最后一个字符的所有以第一个字符开头的连续子串；
 后缀是指不包含第一个字符的所有以最后一个字符结尾的连续子串。
 比如字符串 aaabcaaa 的最长公共前后缀是 aaa ，最长公共前后缀长度就是 3 。
 ## 前缀表
 `next[i]` 表示 `s[0...i]` 这个子字符串的最长公共前后缀长度。
 ## 基本思路
 `i` 代表 pattern 的前缀结尾，`j` 代表 s 的后缀结尾。
 我们假设 `pattern[0...j]` 和 `s[i-j...i]` 一开始是相等的，但是当 `j` 和 `i` 都自增了 1 之后就不完全相等了，即末尾的 `pattern[j]` 和 `s[i]` 不相同。
 但是 `pattern[0...j]` 有一段前缀 `pattern[0...k]` 和 `s` 的一段后缀 `s[i-k...i]` 相同。
 那么我们可以从 `pattern[k+1]` 开始匹配。
 这个时候我们需要让 `j` 回退到 `k+1` 。那么怎么做呢？
 实际上 `k+1 == next[j-1]` 。
 参考 s0028 详细代码实现与注释
--- a/notes/src/reverse_left_words.md
+++ b/notes/src/reverse_left_words.md
@@ -0,0 +1,7 @@
 # 左旋转字符串
 [Leetcode](https://leetcode.cn/problems/zuo-xuan-zhuan-zi-fu-chuan-lcof/)
 涉及到字符串翻转/旋转，都可以考虑全剧翻转+局部翻转。
 比如这道题就可以先翻转前半部分，再翻转后半部分，最后翻转整个字符串。
--- a/src/s0028_find_the_index_of_the_first_occurrence_in_a_string.cpp
+++ b/src/s0028_find_the_index_of_the_first_occurrence_in_a_string.cpp
@@ -1,17 +1,95 @@
 #include "s0028_find_the_index_of_the_first_occurrence_in_a_string.hpp"
-int S0028::strStr(string haystack, string needle) {
+// 构造字符串 s 的前缀表
-  int haystackLen = haystack.length();
+// next[i] 表示 s[0]~s[i] 这个子字符串的最长相同前后缀的长度
-  int needleLen = needle.length();
+// 一个字符串的前缀是指从 0 开始往后到不包含末尾字符的子字符串
-  for (int i{0}; i < haystackLen; ++i) {
+// 一个字符串的后缀是指从末尾开始往前到不包含第一个字符的子字符串
-    for (int j{0}, iTmp = i; j < needleLen; ++j) {
+// 比如 aaba 拥有相同的最长前后缀 a 和 a
-      if (haystack[iTmp] != needle[j]) {
+// 前缀表的一个例子：
-        break;
+// s:    a a b a a f
-      } else if (j == needleLen - 1) {
+// next: 0 1 0 1 2 0
-        return i;
+// 在 getNext() 函数中，我们假设 s.length() >= 2
-      } else {
+void S0028::getNext(int* next, const string& s) {
-        ++iTmp;
+  // j 有两重含义：
  // 1. 前缀的末尾下标
  // 2. 最长相同前后缀的长度
  int j{0};
  // next[0] 很显然一定是为 0 的
  next[0] = 0;
  // 开始迭代，每次迭代我们都会填充 next[i]
  // i 的含义是后缀的末尾下标
  // i 从 1 开始迭代，因为 length >= 2
  // 我们没必要考虑 i == 0 的情况
  int len = s.size();
  for (int i{1}; i < len; ++i) {
    // 当 s[j] 和 s[i] 不想等时，即前后缀不匹配的时候
    // 前缀末尾的下标 j 需要进行回退
    // 回退到什么位置呢？
    // a a a f a a a f
    //             j i
    //       j       i
    // 注意观察，s[j] 和 s[i] 虽然不想等，但是前面这一段
    // aaafaaa 有着公共前后缀 aaa ，所以我们可以试着跳到
    // 前缀 aaa 的后面那个元素的位置 f，然后比较前缀 aaaf
    // 和后缀 aaaf 是否相同。
    // 由于前缀和后缀都有着公共的 aaa ，所以我们只需要比较
    // s[j] 和 s[i] 是否相同就行了。
    // 如果不相同，继续回退，直到 j 回退到起始位置 0。
    // 怎么把 j 跳到 f 的位置呢？f 在 aaa 的后面，aaa 是
    // aaafaaa 的最长公共前缀，所以 f 的下标就是 next[j - 1]
    while (j > 0 && s[i] != s[j]) {
      j = next[j - 1];
    }
    // 接下来处理当 s[j] == s[i] 的情况
    // 这种情况很简单，就是公共前后缀的长度增加了 1
    // 而由于 for 语句中的 ++i 使得后缀末尾 i 已经自增了 1
    // 我们只需要再让前缀末尾 j 自增 1 即可
    if (s[i] == s[j]) {
      ++j;
    }
    // 两种情况都处理完了，接下来更新 next[i]
    // 由于我们之前让 j 自增了 1，所以其实现在的情况是
    // 前缀 [0, j - 1] 和 后缀 [?, i] 相同
    // 然而 next[i] 是指最长公共前后缀的长度，因此长度可以用
    // j 来描述。
    next[i] = j;
  }
 }
 int S0028::strStr(string haystack, string needle) {
  int stringLen = haystack.size();
  int patternLen = needle.size();
  if (patternLen == 0) {
    return 0;
  }
  // 开始创建 pattern 的前缀表
  int next[patternLen];
  getNext(next, needle);
  // j 用来索引 pattern ，它有两层含义
  // 1. 前缀的末尾下标
  // 2. 最长相同前后缀的长度
  int j = 0;
  // 开始迭代
  // 接下来的操作和 getNext() 中的迭代非常相似
  // i 用来索引 string ，它的含义是后缀末尾下标
  // 不过我们这里有个假设，那就是每次迭代开始的时候
  // string[i-j...i] 和 pattern[0...j] 相同
  // 注意，现在 i 从 0 开始迭代，之所以不像 getNext()
  // 中那样从 1 开始迭代是因为 getNext() 不需要考虑 i == 0
  for (int i{0}; i < stringLen; ++i) {
    // 不想等，回退 j ，思路和 getNext 一样
    // 回退之后要么 j == 0 ，要么 string[i-j...i]
    // 和 pattern[0...j] 相同
    while (j > 0 && needle[j] != haystack[i]) {
      j = next[j - 1];
    }
    // 相等，那好说，直接往前推，和 getNext() 一样
    if (needle[j] == haystack[i]) {
      ++j;
    }
    // 成功找到匹配字符串，返回
    if (j == patternLen) {
      return i - patternLen + 1;
    }
  }
  return -1;