Loading a constant into register using ASCII-only x86 code

... this is a task often required when constructing shellcodes. I'm not sure if this is still relevant these days, however, it was fun to do it.

I've picked 3 instructions with ASCII-only opcodes:

26 25 xx xx xx xx    and     eax, imm32
26 2D xx xx xx xx    sub     eax, imm32
26 35 xx xx xx xx    xor     eax, imm32

Will it be possible to generate such a sequence of instructions, so that the arbitrary 32-bit constant would be loaded into EAX regiser? Given the fact that the initial value of EAX is unknown, because, let's say, we can't reset it? Surely, all 32-bit operands must have ASCII-only bytes as well.

The answer is... using Z3 SMT-solver:

#!/usr/bin/env python
from z3 import *
import sys, random

BIT_WIDTH=32

MAX_STEPS=20

#CONST=0
#CONST=0x12345678
#CONST=0x0badf00d
#CONST=0xffffffff

CONST=random.randint(0,0x100000000)
print "CONST=0x%x" % CONST

CHAINS=30

def simulate_op(R, c, op, op1_val, op2_imm, STEPS):
    return If(op==0, op1_val - op2_imm,
           If(op==1, op1_val ^ op2_imm,
           If(op==2, op1_val & op2_imm,
               0))) # default

op_to_str_tbl=["SUB", "XOR", "AND"]
instructions=len(op_to_str_tbl)

def print_model(m, STEPS, op, op2_imm):
    for s in range(1,STEPS):
        op_n=m[op[s]].as_long()
        op_s=op_to_str_tbl[op_n]
        print "%s EAX, 0x%x" % (op_s, m[op2_imm[s]].as_long())

def attempt(STEPS):
    print "attempt, STEPS=", STEPS
    sl=Solver()

    R=[[BitVec('S_s%d_c%d' % (s, c), BIT_WIDTH) for s in range(MAX_STEPS)] for c in range (CHAINS)]
    op=[Int('op_s%d' % s) for s in range(MAX_STEPS)]
    op2_imm=[BitVec('op2_imm_s%d' % s, BIT_WIDTH) for s in range(MAX_STEPS)]

    for s in range(1, STEPS):
        # for each step, instruction is in 0..2 range:
        sl.add(Or(op[s]==0, op[s]==1, op[s]==2))

        # each 8-bit byte in operand must be in [0x21..0x7e] range:
        # or 0x20, if space character is tolerated...
        for shift_cnt in [0,8,16,24]:
            sl.add(And((op2_imm[s]>>shift_cnt)&0xff>=0x21),(op2_imm[s]>>shift_cnt)&0xff<=0x7e)

        """
        # or use 0..9, a..z, A..Z:
        for shift_cnt in [0,8,16,24]:
            sl.add(Or(
                And((op2_imm[s]>>shift_cnt)&0xff>=ord('0'),(op2_imm[s]>>shift_cnt)&0xff<=ord('9')),
                And((op2_imm[s]>>shift_cnt)&0xff>=ord('a'),(op2_imm[s]>>shift_cnt)&0xff<=ord('z')),
                And((op2_imm[s]>>shift_cnt)&0xff>=ord('A'),(op2_imm[s]>>shift_cnt)&0xff<=ord('Z'))))
        """

    # for all input random numbers, the result must be CONST:
    for c in range(CHAINS):
        sl.add(R[c][0]==random.randint(0,0x100000000))
        sl.add(R[c][STEPS-1]==CONST)

        for s in range(1, STEPS):
            sl.add(R[c][s]==simulate_op(R,c, op[s], R[c][s-1], op2_imm[s], STEPS))

    tmp=sl.check()
    if tmp==sat:
        print "sat!"
        m=sl.model()
        print_model(m, STEPS, op, op2_imm)
        exit(0)
    else:
        print tmp

for s in range(2, MAX_STEPS):
    attempt(s)


This is reworked and simplified piece of code I've already used in my "SAT/SMT by example" (under "Program Synthesis" section).

What it can generate for zero:

AND EAX, 0x3e5a3e28
AND EAX, 0x40214040

These two instructions clears EAX. You can understand how it works if you'll see these operands in binary form:

0x3e5a3e28 =    111110010110100011111000101000
0x40214040 =   1000000001000010100000001000000

It's best to have a zero bit for both operands, but this is not always possible, because each of 4 bytes in 32-bit operand must be in [0x21..0x7e] range, so the Z3 solver find a way to reset other bits using second instruction.

Running it again:

AND EAX, 0x3c5e3621
AND EAX, 0x42214850

Operands are different, because SAT solver is probably initialized randomly.

Now 0x0badf00d:

AND EAX, 0x48273048
AND EAX, 0x31504325
XOR EAX, 0x61212251
SUB EAX, 0x55733244

First two AND instruction clears EAX, 3th and 4th makes 0x0badf00d value.

Now 0x12345678:

AND EAX, 0x41212230
XOR EAX, 0x292f2224
AND EAX, 0x365e4048
XOR EAX, 0x323a5678

Slightly different, but also correct.

For some constants, more instructions required:

CONST=0xf3c37766
...
AND EAX, 0x21283024
AND EAX, 0x58504050
SUB EAX, 0x31377b56
SUB EAX, 0x3f2f3b5e
XOR EAX, 0x7c5a3e2a

Now what if, for aesthetical reasons maybe, we would limit all printable characters to 0..9, a..z, A..Z (comment/uncomment corresponding fragments of the source code)? This is not a problem at all.

However, if to limit to a..z, A..Z, it needs more instructions, but this is still correct (8 instructions to clear EAX register):

CONST=0x0
...
XOR EAX, 0x43685575
SUB EAX, 0x6c747a6f
XOR EAX, 0x59525541
AND EAX, 0x65755454
XOR EAX, 0x57416643
AND EAX, 0x76767757
SUB EAX, 0x556f7547
AND EAX, 0x42424242

However, 7 instructions for 0x12345678 constant:

CONST=0x12345678
...
AND EAX, 0x6f77414d
SUB EAX, 0x646b7845
AND EAX, 0x41674a54
SUB EAX, 0x47414744
AND EAX, 0x49486d41
XOR EAX, 0x53757778
AND EAX, 0x7274567a

Further work: use ForAll quantifier instead of randomly generated test inputs... also, we could try INC EAX/DEC EAX instructions.


→ [list of blog posts]

Please drop me email about any bug(s) and suggestion(s): dennis(@)yurichev.com.