Enumerating all possible inputs for a specific regexp using Z3 SMT-solver

Regular expression if first converted to FSM (finite state machine) before matching. Hence, many RE libraries has two functions: "compile" and "execute" (when you match many strings against single RE, no need to recompile it to FSM each time).

And I've found this website, which can visualize FSM (finite state machine) for a regular expression. http://hokein.github.io/Automata.js/. This is fun!

This FSM (DFA) is for the expression (dark|light)?(red|blue|green)(ish)?

Accepting states are in double circles, these are the states where matching process stops.

How can we generate an input string which regular expression would match? In other words, which inputs FSM would accept? This task is surprisingly simple for SMT-solver.

We just define a transition function. For each pair (state, input) it defines new state.

FSM has been visualized by the website mentioned above, and I used this information to write transition() function.

Then we chain transition functions... then we try a chain for all lengths in range of 2..14.

#!/usr/bin/env python
from z3 import *

INVALID_STATE=999
ACCEPTING_STATES=[13, 19, 23, 24]

# s - state
# i - input character
def transition (s, i):
# this is like switch()
return If(And(s==0, i==ord('r')), 3,
If(And(s==0, i==ord('b')), 4,
If(And(s==0, i==ord('g')), 5,
If(And(s==0, i==ord('d')), 1,
If(And(s==0, i==ord('l')), 2,
If(And(s==1, i==ord('a')), 6,
If(And(s==2, i==ord('i')), 7,
If(And(s==3, i==ord('e')), 8,
If(And(s==4, i==ord('l')), 9,
If(And(s==5, i==ord('r')), 10,
If(And(s==6, i==ord('r')), 11,
If(And(s==7, i==ord('g')), 12,
If(And(s==8, i==ord('d')), 13,
If(And(s==9, i==ord('u')), 14,
If(And(s==10, i==ord('e')), 15,
If(And(s==11, i==ord('k')), 16,
If(And(s==12, i==ord('h')), 17,
If(And(s==13, i==ord('i')), 18,
If(And(s==14, i==ord('e')), 19,
If(And(s==15, i==ord('e')), 20,
If(And(s==16, i==ord('r')), 3,
If(And(s==16, i==ord('b')), 4,
If(And(s==16, i==ord('g')), 5,
If(And(s==17, i==ord('t')), 21,
If(And(s==18, i==ord('s')), 22,
If(And(s==19, i==ord('i')), 18,
If(And(s==20, i==ord('n')), 23,
If(And(s==21, i==ord('r')), 3,
If(And(s==21, i==ord('b')), 4,
If(And(s==21, i==ord('g')), 5,
If(And(s==22, i==ord('h')), 24,
If(And(s==23, i==ord('i')), 18,
INVALID_STATE))))))))))))))))))))))))))))))))

def print_model(m, length, inputs):
#print "length=", length
tmp=""
for i in range(length-1):
tmp=tmp+chr(m[inputs[i]].as_long())
print tmp

def make_FSM(length):
s=Solver()
states=[Int('states_%d' % i) for i in range(length)]
inputs=[Int('inputs_%d' % i) for i in range(length-1)]

# initial state:

# the last state must be equal to one of the acceping states

# all states are in limits...
for i in range(length):
# redundant, though. however, we are not interesting in non-matched inputs, right?

# "insert" transition() functions between subsequent states
for i in range(length-1):

# enumerate results:
results=[]
while s.check()==sat:
m=s.model()
print_model(m, length, inputs)
results.append(m)
block = []
for d in m:
c=d()
block.append(c != m[d])

for l in range(2,15):
make_FSM(l)

Results:

red
blue
green
redish
darkred
blueish
darkblue
greenish
lightred
lightblue
darkgreen
lightgreen
darkredish
darkblueish
lightredish
darkgreenish
lightblueish
lightgreenish

As simple as this.

It can be said, what we did is enumeration of all paths between two vertices of a directed graph (representing FSM).

Also, the transition() function itself can act as a RE matcher, with no relevance to SMT solver(s). Just feed input characters to it and track state. Whenever you hit one of accepting states, return "match", whenever you hit INVALID_STATE, return "no match".