(The following text has been copypasted to the SAT/SMT by example book.)
(See part I.)
I reworked the Java code by Robert Sedgewick from his excellent book, and rewritten it to Python:
#!/usr/bin/env python3
def KMP(pat):
R=256
m=len(pat)
# build DFA from pattern
# https://stackoverflow.com/questions/2397141/how-to-initialize-a-two-dimensional-array-in-python
dfa=[[0]*m for r in range(R)]
dfa[ord(pat[0])][0]=1
x=0
for j in range(1, m):
for c in range(R):
dfa[c][j] = dfa[c][x] # Copy mismatch cases.
dfa[ord(pat[j])][j] = j+1 # Set match case.
x = dfa[ord(pat[j])][x] # Update restart state.
return dfa
# https://stackoverflow.com/questions/3525953/check-if-all-values-of-iterable-are-zero
def all_elements_in_list_are_zeros (l):
return all(v==0 for v in l)
def export_dfa_to_graphviz(dfa, fname):
m=len(dfa[0]) # len of pattern
f=open(fname,"w")
f.write("digraph finite_state_machine {\n")
f.write("rankdir=LR;\n")
f.write("\n")
f.write("size=\"8,5\"\n")
f.write(f"node [shape = doublecircle]; S_0 S_{m};\n")
f.write("node [shape = circle];\n")
for state in range(m):
exits=[]
for R in range(256):
next=dfa[R][state]
if next!=0:
exits.append((R,next))
for exit in exits:
next_state=exit[1]
label="'"+chr(exit[0])+"'"
s=f"S_{state} -> S_{next_state} [ label = \"{label}\" ];\n"
f.write (s)
s=f"S_{state} -> S_0 [ label = \"other\" ];\n"
f.write (s)
f.write("}\n")
f.close()
print (f"{fname} written")
def search(dfa, txt):
# simulate operation of DFA on text
m=len(dfa[0]) # len of pattern
n=len(txt)
j=0 # FA state
i=0
while i<n and j<m:
j = dfa[ord(txt[i])][j]
i=i+1
if j == m:
return i - m # found
return n # not found
I added a code to produce a graph in GraphViz format.
Now what about the 'ok' word?

Another word we already know, 'eel':

The 'cocos':

You see, I can translate these FAs (finite automatas) to Python code:
#!/usr/bin/env python3
def search_eel(txt):
# simulate operation of DFA on text
m=3 # len of pattern
n=len(txt)
j=0 # FA state
i=0 # iterator for txt[]
while i<n and j<m:
ch=txt[i]
if j==0 and ch=='e':
j=1
elif j==1 and ch=='e':
j=2
elif j==2 and ch=='e':
j=2
elif j==2 and ch=='l':
j=3
else:
j=0 # reset
i=i+1
if j == m:
return i - m # found
return n # not found
Do you see any similarities with the C code from the previous part?
Yes, what we actually did, was FA, and the 'seen' variable was used as a marker of current state. Here we use the 'j' variable instead, but it's almost the same! We reinvented DFA.The KMP algorithm creates a DFA for each substring to be searched. This is preprocessing.
The DFA is then executes on the input string, in the very same fashion like regular expression matcher.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.