(The following text has been moved to the article at https://yurichev.com/tmp/SAT_SMT_DRAFT.pdf.)

Dennis Yurichev: 29-Apr-2017: Recalculating micro-spreadsheet using Z3Py

29-Apr-2017: Recalculating micro-spreadsheet using Z3Py

There is a nice exercise (blog post in Russian): write a program to recalculate micro-spreadsheet, like this one:

1	0	B0+B2		A0*B0*C0
123	10	12		11
667	A0+B1	(C1*A0)*122	A3+C2

As it turns out, though overkill, this can be solved using Z3 with little effort:

#!/usr/bin/python

from z3 import *
import sys, re

# MS Excel or LibreOffice style.
# first top-left cell is A0, not A1
def coord_to_name(R, C):        
    return "ABCDEFGHIJKLMNOPQRSTUVWXYZ"[R]+str(C)

# open file and parse it as list of lists:
f=open(sys.argv[1],"r")
# filter(None, ...) to remove empty sublists:
ar=filter(None, [item.rstrip().split() for item in f.readlines()])
f.close()

WIDTH=len(ar[0])
HEIGHT=len(ar)

# cells{} is a dictionary with keys like "A0", "B9", etc:
cells={}
for R in range(HEIGHT):
    for C in range(WIDTH):
        name=coord_to_name(R, C)
        cells[name]=Int(name)

s=Solver()

cur_R=0
cur_C=0

for row in ar:
    for c in row:
        # string like "A0+B2" becomes "cells["A0"]+cells["B2"]":
        c=re.sub(r'([A-Z]{1}[0-9]+)', r'cells["\1"]', c)
        st="cells[\"%s\"]==%s" % (coord_to_name(cur_R, cur_C), c)
        # evaluate string. Z3Py expression is constructed at this step:
        e=eval(st)
        # add constraint:
        s.add (e)
        cur_C=cur_C+1
    cur_R=cur_R+1
    cur_C=0

result=str(s.check())
print result
if result=="sat":
    m=s.model()
    for r in range(HEIGHT):
        for c in range(WIDTH):
            sys.stdout.write (str(m[cells[coord_to_name(r, c)]])+"\t")
        sys.stdout.write ("\n")


( https://github.com/dennis714/yurichev.com/blob/master/blog/spreadsheet/1.py )

All we do is just creating pack of variables for each cell, named A0, B1, etc, of integer type. All of them are stored in cells[] dictionary. Key is a string. Then we parse all the strings from cells, and add to list of constraints "A0=123" (in case of number in cell) or "A0=B1+C2" (in case of expression in cell). There is a slight preparation: string like "A0+B2" becomes "cells["A0"]+cells["B2"]".

Then the string is evaluated using Python eval() method, which is highly dangerous: imagine if end-user could add a string to cell other than expression? Nevertheless, it serves our purposes well, because this is a simplest way to pass a string with expression into Z3.

Z3 do the job with little effort:

 % python 1.py test1
sat
1       0       135     82041
123     10      12      11
667     11      1342    83383

Unsat core

Now the problem: what if there is circular dependency? Like:

1	0	B0+B2		A0*B0
123	10	12		11
C1+123	C0*123	A0*122		A3+C2

Two first cells of the last row (C0 and C1) are linked to each other. Our program will just tell "unsat", meaning, it couldn't solve all constraints together. We can't use this as error message reported to end-user, because it's highly unfriendly.

However, we can fetch "unsat core", i.e., list of variables which Z3 finds conflicting.

...
s=Solver()
s.set(unsat_core=True)
...
        # add constraint:
        s.assert_and_track(e, coord_to_name(cur_R, cur_C))
...
if result=="sat":
...
else:
    print s.unsat_core()

( https://github.com/dennis714/yurichev.com/blob/master/blog/spreadsheet/2.py )

We should explicitly turn on unsat core support and use assert_and_track() instead of add() method, because this feature slows down the whole process. That works:

 % python 2.py test_circular
unsat
[C0, C1]

Perhaps, these variables could be removed from the 2D array, marked as "unresolved" and the whole spreadsheet could be recalculated again.

Stress test

How to generate large random spreadsheet? What we can do. First, create random DAG (Directed acyclic graph), like this one:

Arrows will represent information flow. So the vertex (node) which has no incoming arrows to it (indegree=0), can be filled with random numbers. Then we use topological sort to find dependencies between vertices. Then we assign spreadsheet cell names to each vertex. Then we generate random expression with random operations and random numbers to each cell, which will use information from topological sorted graph.

(* Utility functions *)
In[1]:= findSublistBeforeElementByValue[lst_,element_]:=lst[[ 1;;Position[lst, element][[1]][[1]]-1]]

(* Input in 1..∞ range. 1->A0, 2->A1, etc *)
In[2]:= vertexToName[x_,width_]:=StringJoin[FromCharacterCode[ToCharacterCode["A"][[1]]+Floor[(x-1)/width]],ToString[Mod[(x-1),width]]]

In[3]:= randomNumberAsString[]:=ToString[RandomInteger[{1,1000}]]

In[4]:= interleaveListWithRandomNumbersAsStrings[lst_]:=Riffle[lst,Table[randomNumberAsString[],Length[lst]-1]]

(* We omit division operation because micro-spreadsheet evaluator can't handle division by zero *)
In[5]:= interleaveListWithRandomOperationsAsStrings[lst_]:=Riffle[lst,Table[RandomChoice[{"+","-","*"}],Length[lst]-1]]

In[6]:= randomNonNumberExpression[g_,vertex_]:=StringJoin[interleaveListWithRandomOperationsAsStrings[interleaveListWithRandomNumbersAsStrings[Map[vertexToName[#,WIDTH]&,pickRandomNonDependentVertices[g,vertex]]]]]

In[7]:= pickRandomNonDependentVertices[g_,vertex_]:=DeleteDuplicates[RandomChoice[findSublistBeforeElementByValue[TopologicalSort[g],vertex],RandomInteger[{1,5}]]]

In[8]:= assignNumberOrExpr[g_,vertex_]:=If[VertexInDegree[g,vertex]==0,randomNumberAsString[],randomNonNumberExpression[g,vertex]]

(* Main part *) 
(* Create random graph *)
In[21]:= WIDTH=7;HEIGHT=8;TOTAL=WIDTH*HEIGHT
Out[21]= 56

In[24]:= g=DirectedGraph[RandomGraph[BernoulliGraphDistribution[TOTAL,0.05]],"Acyclic"];

...

(* Generate random expressions and numbers *)
In[26]:= expressions=Map[assignNumberOrExpr[g,#]&,VertexList[g]];

(* Make 2D table of it *)
In[27]:= t=Partition[expressions,WIDTH];

(* Export as tab-separated values *)
In[28]:= Export["/home/dennis/1.txt",t,"TSV"]
Out[28]= /home/dennis/1.txt

In[29]:= Grid[t,Frame->All,Alignment->Left]

Here is an output from Grid[]:

846 499 A3*913-H4 808 278 303 D1+579+B6
B4*860+D2 999 59 442 425 A5*163+B2+127*C2*927*D3*213+C1 583
G6*379-C3-436-C4-289+H6 972 804 D2 G5+108-F1*413-D3 B5 G4*981*D2
F2 E0 B6-731-D3+791+B4*92+C1 551 F4*922*C2+760*A6-992+B4-184-A4 B1-624-E3 F4+182+A4*940-E1+76*C1
519 G1*402+D1*107*G3-458*A1 D3 B4 B3*811-D3*345+E0 B5 H5
F5-531+B5-222*E4 9 B5+106*B6+600-B1 E3 A5+866*F6+695-A3*226+C6 F4*102*E4*998-H0 B1-616-G5+812-A5
C3-956*A5 G4*408-D3*290*B6-899*G5+400+F1 B2-701+H6 A3+782*A5+46-B3-731+C1 42 287 H0
B4-792*H4*407+F6-425-E1 D2 D3 F2-327*G4*35*E1 E1+376*A6-606*F6*554+C5 E3 F6*484+C1-114-H4-638-A3

Using this script, I can generate random spreadsheet 26*500 cells (13000 cells), which seems to be processed in couple of seconds.

Conclusion

The files, including Mathematica notebook: https://github.com/dennis714/yurichev.com/tree/master/blog/spreadsheet.

Other Z3-related examples: https://yurichev.com/tmp/SAT_SMT_DRAFT.pdf.


→ [list of blog posts, my twitter/facebook]

The page last updated on 08-May-2017