(The following text has been copypasted to the SAT/SMT by example book.)
(See part I.)
I reworked the Java code by Robert Sedgewick from his excellent book, and rewritten it to Python:
#!/usr/bin/env python3 def KMP(pat): R=256 m=len(pat) # build DFA from pattern # https://stackoverflow.com/questions/2397141/how-to-initialize-a-two-dimensional-array-in-python dfa=[[0]*m for r in range(R)] dfa[ord(pat[0])][0]=1 x=0 for j in range(1, m): for c in range(R): dfa[c][j] = dfa[c][x] # Copy mismatch cases. dfa[ord(pat[j])][j] = j+1 # Set match case. x = dfa[ord(pat[j])][x] # Update restart state. return dfa # https://stackoverflow.com/questions/3525953/check-if-all-values-of-iterable-are-zero def all_elements_in_list_are_zeros (l): return all(v==0 for v in l) def export_dfa_to_graphviz(dfa, fname): m=len(dfa[0]) # len of pattern f=open(fname,"w") f.write("digraph finite_state_machine {\n") f.write("rankdir=LR;\n") f.write("\n") f.write("size=\"8,5\"\n") f.write(f"node [shape = doublecircle]; S_0 S_{m};\n") f.write("node [shape = circle];\n") for state in range(m): exits=[] for R in range(256): next=dfa[R][state] if next!=0: exits.append((R,next)) for exit in exits: next_state=exit[1] label="'"+chr(exit[0])+"'" s=f"S_{state} -> S_{next_state} [ label = \"{label}\" ];\n" f.write (s) s=f"S_{state} -> S_0 [ label = \"other\" ];\n" f.write (s) f.write("}\n") f.close() print (f"{fname} written") def search(dfa, txt): # simulate operation of DFA on text m=len(dfa[0]) # len of pattern n=len(txt) j=0 # FA state i=0 while i<n and j<m: j = dfa[ord(txt[i])][j] i=i+1 if j == m: return i - m # found return n # not found
I added a code to produce a graph in GraphViz format.
Now what about the 'ok' word?
Another word we already know, 'eel':
The 'cocos':
You see, I can translate these FAs (finite automatas) to Python code:
#!/usr/bin/env python3 def search_eel(txt): # simulate operation of DFA on text m=3 # len of pattern n=len(txt) j=0 # FA state i=0 # iterator for txt[] while i<n and j<m: ch=txt[i] if j==0 and ch=='e': j=1 elif j==1 and ch=='e': j=2 elif j==2 and ch=='e': j=2 elif j==2 and ch=='l': j=3 else: j=0 # reset i=i+1 if j == m: return i - m # found return n # not found
Do you see any similarities with the C code from the previous part?
Yes, what we actually did, was FA, and the 'seen' variable was used as a marker of current state. Here we use the 'j' variable instead, but it's almost the same! We reinvented DFA.The KMP algorithm creates a DFA for each substring to be searched. This is preprocessing.
The DFA is then executes on the input string, in the very same fashion like regular expression matcher.
List of my other blog posts. My company.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.