Problem 459: Repeated Substring Pattern
Question
LeetCode Link | 459. Repeated Substring Pattern | Easy
Given a string s
, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.
Constrains
1 <= s.length <= 10^4
s
consists of lowercase English letters.
Examples
Example 1
Input: s = “abab”
Output: true
Explanation: It is the substring “ab” twice.
Example 2
Input: s = “aba”
Output: false
Example 3
Input: s = “abcabcabcabc”
Output: true
Explanation: It is the substring “abc” four times or the substring “abcabc” twice.
Solution Approach
Method: Brute Force
Constrains
- None
This method does not have any limitations.
Concept Explanation
The brute-force method involves using a single for-loop to determine the end position of the substring, and then, within a nested for-loop, checking if the substring can repeatedly form the entire string. This results in a time complexity of O(n^2).
Some might wonder how it’s possible to get the substring with just one for-loop. Wouldn’t you need one for-loop to determine the start position of the substring and another for-loop for the end position?
In fact, you only need to consider substrings that start with the first letter, so one for-loop to determine the end position of the substring is sufficient. Additionally, you don’t even need to iterate to the end of the string, just up to the middle position. This is because if the end position of the substring is beyond the middle, it definitely cannot form the entire string by repeating.
1 | // Bruteforce |
The brute-force solution will not be explained in detail here.
Method: String rotation / Concatenation
Constrains
- None
This method does not have any limitations.
Concept Explanation
When a string s: abcabc
is composed of repeated substrings, the structure of the string must be as follows:
This means it is composed of identical substrings at the beginning and end.
Therefore, since there are identical substrings at the beginning and at the end, if you concatenate the string with itself (s + s)
, the substring in the back acts as the front substring, and the front substring acts as the back substring in the new string, which will definitely form another s
, as illustrated:
Therefore, to determine whether string s is composed of repeating substrings, simply concatenate two s’s together. If another s
appears within this new string, it indicates that s
is composed of repeating substrings.
Of course, when checking whether an s
appears within the concatenated string s + s
, you should exclude the first and last characters of s + s
. This prevents the search from simply finding the original s
in s + s
. Instead, the goal is to find the s that appears due to the concatenation in the middle.
1 | // concatenate the string with itself and remove the first and last character |
However, this approach has a potential issue: it ultimately requires determining whether the string (s + s) contains s, for which many might directly use library functions like contains
or find
. This overlooks the time complexity of implementing these functions. While the brute-force approach has a time complexity of O(𝑚×𝑛), general library functions are implemented with a time complexity of 𝑂(𝑚+𝑛).
Method: KMP
Constrains
- None
This method does not have any limitations.
Concept Explanation
Searching for the occurrence of one string within another is a specialty of the KMP (Knuth-Morris-Pratt) algorithm. So, how does finding repeated substrings also involve the KMP algorithm?
It is precisely because of the rule of the longest equal prefix and suffix that when a string is composed of repeating substrings, the substring not included in the longest equal prefix and suffix is the smallest repeating substring. (abc
as the figure shown above)
Assuming the string s
is composed of multiple repeating substrings (where this substring is the smallest repeating unit), and the length of the repeating substring is x
, then s
is composed of n * x
.
Since the length of the longest common prefix and suffix of string s does not include s itself, the length of the longest common prefix and suffix must be m * x
, and n - m = 1
(refer to the diagram below if this is unclear).
The length of the longest common prefix and suffix is: lsp[len - 1]
(the last index of LPS), and the length of the array is: len
(or saying s.length
).
If len % (len - (lsp[len - 1])) == 0
, it indicates that the array’s length can be evenly divided by (array length - length of the longest common prefix and suffix), meaning that the string has repeating substrings.
The array length minus the length of the longest common prefix and suffix is essentially the length of the first cycle, i.e., one cycle length. If this cycle length can evenly divide the array length, it indicates that the entire array is a repetition of this cycle.
For example, we use abcabcabc
:
lsp[len - 1] = 6
,6
repersent the longest prefix and suffix of theabcabcabc
is 6len - (lsp[len - 1]) = 9 - 6 = 3
,3
repersent the substringabc
length is 3.- If this can be divide by whole string length, it means this is a repeated substring pattern.
Code
- Time complexity: O(n)
- Space complexity: O(n)
TypeScript
1 | function repeatedSubstringPattern(s: string): boolean { |
Conclusion
The usage of KMP can be applied on any question that searching for the occurrence of one string within another. Just carefully do the math question would solve most of this type questions.