# #StackBounty: #python #performance #algorithm #strings #search Naive implementation of KMP algorithm

### Bounty: 50

After reading this answer to the question “High execution time to count overlapping substrings”, I decided to implement the suggested Knuth-Morris-Pratt (KMP) algorithm. I used the pseudo-code listed on Wikipedia for the functions `kmp_table` and `kmp_search`.

However, when running it on some corner-cases, I have observed that it is a lot slower than the standard `str.find`, which apparently uses a modified Boyer-Moore-Horspool algorithm and should thus have worse worst-case performance.

The specific case I looked at is:

``````\$ ipython -i kmp.py
In [1]: text = "A"*1000000 + "B"
In [2]: word = "A"*100 + "B"
In [3]: %timeit kmp_search(text, word)
1 loop, best of 3: 410 ms per loop
In [4}: %timeit text.find(word)
1000 loops, best of 3: 703 µs per loop
``````

So the difference is about a factor 1000 for this input. This is probably due to the fact that the native one is written in C and this is written in Python, but I still wanted to see if I did anything stupid here or missed any obvious optimization.

``````def kmp_table(word):
table = [0] * len(word)
position, candidate = 2, 0
table[0] = -1

while position < len(word):
if word[position - 1] == word[candidate]:
table[position] = candidate + 1
candidate += 1
position += 1
elif candidate > 0:
candidate = table[candidate]
else:
table[position] = 0
position += 1
return table

def kmp_search(text, word):
m, i = 0, 0
table = kmp_table(word)
while m + i < len(text):
if word[i] == text[m + i]:
if i == len(word) - 1:
return m
i += 1
else:
if table[i] > -1:
m += i - table[i]
i = table[i]
else:
m += 1
i = 0
return len(text)
``````

Get this bounty!!!