(The following text has been copypasted to the SAT/SMT by example book.)
For my first blog posts dedicated to backreferences, see: SAT solver on top of regex matcher.
This time, I searched for good words that can serve as examples for my blog posts about Knuth-Morris-Pratt algorithm. I wanted a list of words that have repeated prefixes and suffixes.
I took a good collection of English words here.
Then I used sed to find words with repeated prefixes:
sed -E -n '/^(.+)\1(.+)$/p' words_alpha.txtSome of them:
eel oops ooze cocoa cocos kokos mimic cocoon
I couldn't manage sed to find repeated suffixes, so I wrote a Racket program to do that (each suffix must have at least two characters):
#lang racket ;(define r (pregexp "^(.+)\\1(.+)$")) ; two prefixes (define r (pregexp "^(.+)(..+)\\2$")) ; two suffixes >=2 (define (f s) (regexp-match r s)) (define result (sort (filter f (file->lines "words_alpha.txt")) (lambda (x y) (< (string-length x) (string-length y))))) (for ([i result]) (displayln i))
Some of these:
ceded crisis rococo cantata
That sounds as a list of diseases:
hydrofluosilicic integropallialia interjaculateded panmyelophthisis plasmaphoresisis pneumonophthisis antihemagglutinin ophthalmophthisis bacterioagglutinin erythrocytoschisis phytohemagglutinin thoracoceloschisis craniorhachischisis phytohaemagglutinin thoracogastroschisis
I couln'd stand the itch and tried to find all words with 3 repeated suffixes:
(define r (pregexp "^(.+)(.+)\\2\\2$"))
That includes both words with 3 repeated characters at the end and the rare term 'ratatat' -- 3 repeated 'at' suffix:
brrr ieee mmmm oooo viii xiii xviii xxiii ratatat earlesss
('earlesss' seems to be a typo in the list of words I used.)
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.