Question

LeetCode Link | 28. Find the Index of the First Occurrence in a String | Easy

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Constrains

  • 1 <= haystack.length, needle.length <= 10^4
  • haystack and needle consist of only lowercase English characters.

Examples

Example 1

Input: haystack = “sadbutsad”, needle = “sad”
Output: 0
Explanation: “sad” occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.

Example 2

Input: haystack = “leetcode”, needle = “leeto”
Output: -1
Explanation: “leeto” did not occur in “leetcode”, so we return -1.

Solution Approach

Method: KMP

Constrains

  • None

This method does not have any limitations.

Concept Explanation

This question can be solve by brute force, which is basically compare each index of pattern with string, and re-compare form the start if failed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
function strStr(haystack: string, needle: string): number {
// Edge case: needle is empty
if (needle.length === 0) return 0;

for (let i = 0; i < haystack.length; i++) {
// Check if needle is in haystack
if (haystack[i] === needle[0]) {
for (let j = 0; j < needle.length; j++) {
// If the needle is fully matched
if (j === needle.length - 1) {
return i;
}
// If not match
if (haystack[i + j] !== needle[j]) {
break;
}
}
}
}
// Not found
return -1;
}

This cost O(n * m) time since we match from start of needle each time, which could waste lots of time if some pattern in needle is overlaped.

For example, we have haystack aaaaaaaaf and needle aaaaf, it wil always re-match from first character a, but we could save this time since first four character are the same.

Introducing the KMP Algorithm, this is a string matching algorithm designed for searching substrings or patterns within a main text string efficiently. By using KMP, we could only compare pattern after last a in needle, which is much more efficiently. For more detail, check the post The KMP Algorithm.

To use KMP, we have to compute the LPS array first. This array contains the index we need to go back to if we failed to match the string. For more detail on how to create a LPS array, check the post The KMP Algorithm.

Then we can iterate haystack string to find if needle match by comparing character just like brute force. However, if we failed to match character, we can just use LPS array to go to the nearest index that we already validate all the characters before this index.

Code

  • Time complexity: O(n + m)
  • Space complexity: O(m), need m space to store LPS

TypeScript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
function strStr(haystack: string, needle: string): number {
// Compute Longest Prefix which is also a Suffix (LPS) array
function getLps(pattern: string): number[] {
let lps: number[] = new Array(pattern.length).fill(0);
let len: number = -1;
lps[0] = len;
for (let i = 1; i < pattern.length; i++) {
// Find last matched pattern if not match
while (len >= 0 && pattern[i] != pattern[len + 1]) {
len = lps[len];
}
// Update length when this character match
if (pattern[i] === pattern[len + 1]) {
len++;
}
// Update LPS array
lps[i] = len;
}
return lps;
}

// Edge case: needle is empty
if (needle.length === 0) return 0;

// Get LPS for needle
const lps: number[] = getLps(needle);
let j: number = -1;
for (let i = 0; i < haystack.length; i++) {
// Find last matched pattern if not match
while (j >= 0 && needle[j + 1] != haystack[i]) {
j = lps[j];
}
// Update length when this character match
if (needle[j + 1] === haystack[i]) {
// If it fully matches, return the result, no need to j++
if (j === needle.length - 2) {
return i - j - 1;
}
j++;
}
}
// Not found
return -1;
};

Conclusion

Using the LPS array, the KMP algorithm significantly reduces the time complexity of the search phase, making it efficient for searching substrings in larger texts. Try to understand the logic of the KMP algorithm instead of memories the code, this is a commonly used algorithm in string comparisons, memories the hard code would not help you during interview:(