(The following text has been copypasted to the SAT/SMT by example book.)

(See part II.)

DFAs generated by the KMP algorithms are *sparse* and have regularities we can observe easily.
One popular optimization is not using DFA, but rather a small "partial match" table:

// Knuth–Morris–Pratt algorithm // copypasted from http://cprogramming.com/snippets/source-code/knuthmorrispratt-kmp-string-search-algorithm char *kmp_search(char *haystack, size_t haystack_size, char *needle, size_t needle_size) { int *T; int i, j; char *result = NULL; if (needle_size==0) return haystack; /* Construct the lookup table */ T = (int*) malloc((needle_size+1) * sizeof(int)); T[0] = -1; for (i=0; i<needle_size; i++) { T[i+1] = T[i] + 1; while (T[i+1] > 0 && needle[i] != needle[T[i+1]-1]) T[i+1] = T[T[i+1]-1] + 1; } printf ("restarts table:\n"); for (i=0; i<needle_size+1; i++) printf ("T[%d]=%d\n", i, T[i]); /* Perform the search */ for (i=j=0; i<haystack_size; ) { if (j>=0) print_compare (haystack, i, needle, j); if (j < 0 || haystack[i] == needle[j]) { ++i, ++j; if (j == needle_size) { result = haystack+i-j; break; } } else { j = T[j]; if (j==-1) printf ("Restarting needle at the beginning\n"); else print_state_needle(needle, j); }; } free(T); return result; }

Now let's search for 'eel' in the 'eex eel' string:

restarts table: T[0]=-1 T[1]=0 T[2]=1 T[3]=0 going to compare: eex eel eel ^ going to compare: eex eel eel ^ going to compare: eex eel eel ^ Restarting needle at: eel ^ going to compare: eex eel eel ^ Restarting needle at: eel ^ going to compare: eex eel eel ^ Restarting needle at the beginning going to compare: eex eel eel ^ Restarting needle at the beginning going to compare: eex eel eel ^ going to compare: eex eel eel ^ going to compare: eex eel eel ^

The T[] table is small, which is good for cache memory. But please note: a character is to be loaded twice in case of restart. But this is fine, it's cached anyway at that moment.

Now the interesting thing: if the 'partial match' table is empty (all zeroes), the search function can be implemented naively, like I tried in the part I.

These are the cases of words like 'tesa', 'lee', 'banana'. But a non-empty table must be constructed for words like 'eel', 'oops', 'test', 'elee' -- these are words where a prefix (1-character minimum) is repeated in the word. But this is not true for 'banana' -- there is no repeating prefix.

All the files I used, including all the papers I've found interesting.

Please drop me email about bug(s) and/or suggestion(s): my emails. List of my other blog posts.