The Knuth-Morris-Pratt (KMP) is an efficient string-matching algorithm developed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977. It is used for finding a specific pattern in the given string. In this article, we will learn how to implement KMP algorithm in a C++ program.
Table of Content
What is KMP Algorithm?
The Knuth-Morris-Pratt (KMP) algorithm is a popular and efficient method to find the smaller string (termed as the "pattern") inside a larger string (termed as the "text"). It speeds up the search process by avoiding unnecessary comparisons by precomputing the . The KMP algorithm works in two main steps:
- Preprocessing the Pattern: Before searching, KMP creates an array (called the "LPS" array) based on the pattern that helps determine how much of the pattern has already been matched.
- Searching the Text: Using the LPS array, KMP quickly skips over parts of the text where it's certain the pattern can't match, making the search more efficient.
Implementation of KMP Algorithm in C++
There are two steps to implement the KMP Algorithm:
Create the LPS Array
The LPS array tells us how much of the pattern can be reused when a mismatch occurs. Here’s how we create the LPS array for the pattern P = "ABABCABAB":
Index(i) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
Pattern | A | B | A | B | C | A | B | A | B |
LPS Value | 0 | 0 | 1 | 2 | 0 | 1 | 2 | 3 | 4 |
The LPS array helps the algorithm skip unnecessary comparisons by showing how much of the pattern can be reused after a mismatch.
Match the Pattern Against the Text
Now, using the LPS array, we compare the pattern with the text:
- Start at the beginning of both the text and the pattern.
- Compare characters one by one.
- If they match, move both pointers (one in the text and one in the pattern) forward.
- If they don't match, use the LPS array to decide how far to skip ahead in the pattern without re-checking parts of the text we've already looked at.
Code Example
// C++ Program to implement the
// working of KMP Algorithm
#include <bits/stdc++.h>
using namespace std;
// Function to compute the LPS array
void computeLPSArray(string pattern,
int m, vector<int> &LPS) {
int length = 0;
LPS[0] = 0; // LPS[0] is always 0
int i = 1;
while (i < m) {
if (pattern[i] ==
pattern[length]) {
length++;
LPS[i] = length;
i++;
}
else {
if (length != 0) {
length = LPS[length - 1];
}
else {
LPS[i] = 0;
i++;
}
}
}
}
// KMP algorithm to search for
// pattern in text
vector<int> KMP(string pattern,
string text) {
int m = pattern.length();
int n = text.length();
// Array for storing the starting index
// of text, where the pattern exist
vector<int> ans;
vector<int> LPS(m);
computeLPSArray(pattern, m, LPS);
int i = 0; // index for text
int j = 0; // index for pattern
while (i < n) {
// When character matches so we
// have to increase both the pointers
if (pattern[j] == text[i]) {
i++;
j++;
}
// When the pattern is completely matched,
// store the starting position of text,
// Where the pattern exist
if (j == m) {
ans.push_back(i - j);
j = LPS[j - 1];
}
// When the character is not matched,
// we have to move to previous index up
// to which the matches take place
else if (i < n && pattern[j]
!= text[i]) {
if (j != 0) {
j = LPS[j - 1];
}
else {
i++;
}
}
}
return ans;
}
int main() {
string text = "ABABDABACDABABCABAB";
string pattern = "ABABCABAB";
vector<int> ans = KMP(pattern, text);
cout << "Pattern Found at Indexes: ";
for (auto i : ans)
cout << i << " ";
return 0;
}
Output
Pattern Found at Indexes: 10
Complexity Analysis of KMP Algorithm
- Time Complexity: The KMP algorithm runs in O(m + n) time, where m is the length of the pattern and n is the length of the text.
- Space Complexity: The space needed for the LPS array is O(m), which is proportional to the length of the pattern.