Toy decompiler

Begin

This is a true story. I once worked with Xilinx Microblaze code and found this sequence of the addk instructions:

addk    r3,r5,r5
addk    r3,r3,r3
addk    r3,r3,r3

The format is: addk dst, src1, src2

This is actually multiplication, but implemented via addition, because it's faster than multiplication instruction.

Also, Xilinx Microblaze can be compiled with multiplication instruction omitted. Ctrl-F: "C_USE_HW_MUL".

Manual rewriting.

Let's rewrite it manually:

addk    r3,r5,r5
addk    r3,r3,r3
addk    r3,r3,r3
addk    r3,r5,r5
; now r3 = r5+r5 = r5*2
addk    r3,r3,r3
; now r3 = r3*r3 = r5*2 + r5*2 = r5*4
addk    r3,r3,r3
; now r3 = r3*r3 = r5*4 + r5*4 = r5*8

Aha, now we see. Multiplication by 8.

Next example:

addk    r3,r5,r5
addk    r3,r3,r5

Let's rewrite:

addk    r3,r5,r5
; now r3 = r5+r5 = r5*2
addk    r3,r3,r5
; now r3 = r3+r5 = r5*2+r5 = r5*3

This is multiplication by 3.

Next example with rsubk instruction, which is: rsubk dst, x, y. And dst=y-x (hence 'r-' prefix, meaning 'reversed', like in MIPS).

addk    r3,r5,r5
addk    r3,r3,r3
addk    r3,r3,r3
rsubk   r3,r5,r3

Let's rewrite:

addk    r3,r5,r5
; now r3 = r5+r5 = r5*2
addk    r3,r3,r3
; now r3 = r3+r3 = r3*2 = r5*2*2 = r5*4
addk    r3,r3,r3
; now r3 = r3+r3 = r3*2 = r5*4*2 = r5*8
rsubk   r3,r5,r3
; now r3 = r3-r5 = r5*8-r5 = r5*7

This is multiplication by 7.

We start to feel this process is like a shell game (you have to keep in mind what is in which register) and can be automated.

Toy decompiler

This partial decompiler written in Python can do something already. We parse input assembly listings using regexps. Rewriting also happens using regexps.

regs=[("initial_r%d" % i) for i in range(32)]
regs[0]="0" # r0 always has zero, as in MIPS

def dump_regs():
    global regs
    for reg in range(32):
        if regs[reg] != f"initial_r{reg}":
            print (f"r{reg}", "=", rewrite(regs[reg]))

def regname_to_regno(reg):
    return int(reg.removeprefix("r").removesuffix(","))

def parse_addk(l):
    global regs
    tmp=l.split(" ")
    assert tmp[0]=="addk"
    dst=regname_to_regno(tmp[1])
    src1=regname_to_regno(tmp[2])
    src2=regname_to_regno(tmp[3])
    #print (dst, src1, src2)
    if src1==src2:
        regs[dst]=f"({regs[src1]}*2)"
    else:
        regs[dst]=f"({regs[src1]}+{regs[src2]})"

...

# these regexp subexpressions are unusual a bit
# ?P<name> means that subexpression will have a name:
# https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups
# ? at the end means non-greedy regexp:
# https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy
# non-greedy regexps are used here because we want to find smallest possible subexpression
# default is greedy -- largest possible subexpression is to be found, but we don't want this
re_X= "(?P<X>[a-z0-9_]+?)"
re_X1="(?P<X1>[a-z0-9_]+?)"
re_X2="(?P<X2>[a-z0-9_]+?)"
re_N= "(?P<N>[0-9]+?)"
re_N1="(?P<N1>[0-9]+?)"
re_N2="(?P<N2>[0-9]+?)"

# ((X*N1)*N2) -> X*(N1+N2)
def rewrite_two_mul(s):
    x=re.compile("\\(\\("+re_X+"\\*"+re_N1+"\\)\\*"+re_N2+"\\)")
    m=x.search(s)
    if m==None:
        return s
    m_start, m_end = m.start(0), m.end(0)
    X=m.group("X")
    N1=int(m.group("N1"))
    N2=int(m.group("N2"))
    new_expr=f"({X}*{N1*N2})"
    rt=s[0:m_start]+new_expr+s[m_end:] # replace substring
    return rt
    
# test
assert rewrite_two_mul("((x*2)*2)")=="(x*4)"
assert rewrite_two_mul("y+((x*2)*2)+z")=="y+(x*4)+z"

...

def rewrite_one_step(s):
    s=rewrite_two_mul(s)
    s=rewrite_1(s)
    s=rewrite_1_rvr(s)
    s=rewrite_2(s)
    s=rewrite_3(s)
    s=rewrite_4(s)
    return s

def rewrite(s):
    #print (f"Before rewrite: {s}")
    while True:
        new_s=rewrite_one_step(s)
        if new_s==s:
            #print (f"After rewrite: {new_s}")
            return new_s
        s=new_s

for l in sys.stdin:
    l=l.rstrip()
    if l.startswith ("addk"):
        parse_addk(l)
    elif l.startswith ("rsubk"):
        parse_rsubk(l)
    else:
        print ("panic, can't parse: ", l)
        exit(0)
    print (f"state of registers after {l}:")
    dump_regs()

print ("final state of registers:")
dump_regs()
 % ./decomp.py < f8
state of registers after addk r3, r5, r5:
r3 = (initial_r5*2)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*4)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*8)
final state of registers:
r3 = (initial_r5*8)

All rewriting rules to be used here are:

((X*N1)*N2) -> X*(N1+N2)
((X*N)+X) -> X*(N+1)
(X+(X*N)) -> X*(N+1)
((X*N)-X) -> X*(N-1)
((X*N1)+(X*N2)) -> X*(N1+N2)
((X*N1)-(X*N2)) -> X*(N1-N2)

And some tests from decomp.py:

assert rewrite_two_mul("((x*2)*2)")=="(x*4)"
assert rewrite_1("((x*2)+x)")=="(x*3)"
assert rewrite_1_rvr("(x+(x*2))")=="(x*3)"
assert rewrite_2("((x*10)-x)")=="(x*9)"
assert rewrite_3("((x*10)+(x*15))")=="(x*25)"
assert rewrite_4("((x*16)-(x*10))")=="(x*6)"

Now this can be 'decompiled':

addk r3, r5, r5
addk r3, r3, r5
 % ./decomp.py < f3
state of registers after addk r3, r5, r5:
r3 = (initial_r5*2)
state of registers after addk r3, r3, r5:
r3 = (initial_r5*3)
final state of registers:
r3 = (initial_r5*3)

And this:

addk r3, r5, r5
addk r3, r3, r3
addk r3, r3, r3
rsubk r3, r5, r3
 % ./decomp.py < f7
state of registers after addk r3, r5, r5:
r3 = (initial_r5*2)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*4)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*8)
state of registers after rsubk r3, r5, r3:
r3 = (initial_r5*7)
final state of registers:
r3 = (initial_r5*7)

Harder examples from real code

This is multiplication by 86400 (number of seconds in day), but some intermediate results are left in r3 and r4 registers:

addk r3, r22, r22
addk r3, r3, r22
addk r4, r3, r3
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
rsubk r4, r3, r4
addk r3, r4, r4
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r3
rsubk r3, r4, r3
addk r25, r3, r3
addk r25, r25, r25
addk r25, r25, r25
addk r25, r25, r25
addk r25, r25, r25
addk r25, r25, r25
addk r25, r25, r25

The result:

state of registers after addk r3, r22, r22:
r3 = (initial_r22*2)
state of registers after addk r3, r3, r22:
r3 = (initial_r22*3)
state of registers after addk r4, r3, r3:
r3 = (initial_r22*3)
r4 = (initial_r22*6)
state of registers after addk r4, r4, r4:
r3 = (initial_r22*3)
r4 = (initial_r22*12)
state of registers after addk r4, r4, r4:
r3 = (initial_r22*3)
r4 = (initial_r22*24)
state of registers after addk r4, r4, r4:
r3 = (initial_r22*3)
r4 = (initial_r22*48)
state of registers after rsubk r4, r3, r4:
r3 = (initial_r22*3)
r4 = (initial_r22*45)
state of registers after addk r3, r4, r4:
r3 = (initial_r22*90)
r4 = (initial_r22*45)
state of registers after addk r3, r3, r3:
r3 = (initial_r22*180)
r4 = (initial_r22*45)
state of registers after addk r3, r3, r3:
r3 = (initial_r22*360)
r4 = (initial_r22*45)
state of registers after addk r3, r3, r3:
r3 = (initial_r22*720)
r4 = (initial_r22*45)
state of registers after rsubk r3, r4, r3:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
state of registers after addk r25, r3, r3:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*1350)
state of registers after addk r25, r25, r25:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*2700)
state of registers after addk r25, r25, r25:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*5400)
state of registers after addk r25, r25, r25:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*10800)
state of registers after addk r25, r25, r25:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*21600)
state of registers after addk r25, r25, r25:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*43200)
state of registers after addk r25, r25, r25:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*86400)
final state of registers:
r3 = (initial_r22*675)
r4 = (initial_r22*45)
r25 = (initial_r22*86400)

Also, multiplication by 1103515245 (the constant used in simple PRNG) with one intermediate result left in r4:

addk r4, r5, r5
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
rsubk r4, r5, r4
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r5
addk r3, r4, r4
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r3
rsubk r3, r4, r3
addk r3, r3, r3
addk r3, r3, r5
addk r4, r3, r3
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
addk r4, r4, r4
addk r3, r3, r4
addk r3, r3, r3
addk r3, r3, r3
rsubk r3, r5, r3
addk r3, r3, r3
addk r3, r3, r3
addk r3, r3, r5
state of registers after addk r4, r5, r5:
r4 = (initial_r5*2)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*4)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*8)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*16)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*32)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*64)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*128)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*256)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*512)
state of registers after rsubk r4, r5, r4:
r4 = (initial_r5*511)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*1022)
state of registers after addk r4, r4, r4:
r4 = (initial_r5*2044)
state of registers after addk r4, r4, r5:
r4 = (initial_r5*2045)
state of registers after addk r3, r4, r4:
r3 = (initial_r5*4090)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*8180)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*16360)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*32720)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*65440)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*130880)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*261760)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*523520)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*1047040)
r4 = (initial_r5*2045)
state of registers after rsubk r3, r4, r3:
r3 = (initial_r5*1044995)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*2089990)
r4 = (initial_r5*2045)
state of registers after addk r3, r3, r5:
r3 = (initial_r5*2089991)
r4 = (initial_r5*2045)
state of registers after addk r4, r3, r3:
r3 = (initial_r5*2089991)
r4 = (initial_r5*4179982)
state of registers after addk r4, r4, r4:
r3 = (initial_r5*2089991)
r4 = (initial_r5*8359964)
state of registers after addk r4, r4, r4:
r3 = (initial_r5*2089991)
r4 = (initial_r5*16719928)
state of registers after addk r4, r4, r4:
r3 = (initial_r5*2089991)
r4 = (initial_r5*33439856)
state of registers after addk r4, r4, r4:
r3 = (initial_r5*2089991)
r4 = (initial_r5*66879712)
state of registers after addk r3, r3, r4:
r3 = (initial_r5*68969703)
r4 = (initial_r5*66879712)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*137939406)
r4 = (initial_r5*66879712)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*275878812)
r4 = (initial_r5*66879712)
state of registers after rsubk r3, r5, r3:
r3 = (initial_r5*275878811)
r4 = (initial_r5*66879712)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*551757622)
r4 = (initial_r5*66879712)
state of registers after addk r3, r3, r3:
r3 = (initial_r5*1103515244)
r4 = (initial_r5*66879712)
state of registers after addk r3, r3, r5:
r3 = (initial_r5*1103515245)
r4 = (initial_r5*66879712)
final state of registers:
r3 = (initial_r5*1103515245)
r4 = (initial_r5*66879712)

Bottom line

We successfully decompile (long) add/sub chains, given that the only integer data type is used, like 'int'.

A compiler can do some of this too

Let's rewrite multiplication by 7 to pure C:

addk    r3,r5,r5
addk    r3,r3,r3
addk    r3,r3,r3
rsubk   r3,r5,r3
int f(int r5)
{
    int r3;
    r3=r5+r5;
    r3=r3+r3;
    r3=r3+r3;
    r3=r3-r5;
    return r3;
}

Let's try to compile this:

gcc -S -O3 -masm=intel test7.c

The output:

    lea eax, 0[0+rdi*8]
    sub eax, edi
    ret

Alas, GCC can optimize chain of additions, but can't go further.

All the files

Full python code. (This Python program I used and cited is just under 200 SLOC.)

All the files, with tests

Next step(s)

Working with (sub)expressions in form of strings -- absurdic. No (de-)compiler do this. Of course, I did this as a quick/dirty/experimental hack, for demonstration.

My second toy decompile works with structures akin to what (toy) compiler usually does. Ctrl-F: "Toy decompiler".

This post is like a 'prequel' to the toy decompiler in my SAT/SMT by Example.

(the post first published at 20251117.)


List of my other blog posts.

Subscribe to my news feed,

Some time ago (before 24-Mar-2025) there was Disqus JS script for comments. I dropped it --- it was so motley, distracting, animated, with too much ads. I never liked it. Also, comments didn"t appeared correctly (Disqus was buggy). Also, my blog is too chamberlike --- not many people write comments here. So I decided to switch to the model I once had at least in 2020 --- send me your comments by email (don"t forget to include URL to this blog post) and I"ll copy&paste it here manually.

Let"s party like it"s ~1993-1996, in this ultimate, radical and uncompromisingly primitive pre-web1.0-style blog and website.