... this is a task often required when constructing shellcodes. I'm not sure if this is still relevant these days, however, it was fun to do it.

I've picked 3 instructions with ASCII-only opcodes:

26 25 xx xx xx xx and eax, imm32 26 2D xx xx xx xx sub eax, imm32 26 35 xx xx xx xx xor eax, imm32

Will it be possible to generate such a sequence of instructions, so that the arbitrary 32-bit constant would be loaded into EAX regiser? Given the fact that the initial value of EAX is unknown, because, let's say, we can't reset it? Surely, all 32-bit operands must have ASCII-only bytes as well.

The answer is... using Z3 SMT-solver:

#!/usr/bin/env python from z3 import * import sys, random BIT_WIDTH=32 MAX_STEPS=20 #CONST=0 #CONST=0x12345678 #CONST=0x0badf00d #CONST=0xffffffff CONST=random.randint(0,0x100000000) print "CONST=0x%x" % CONST CHAINS=30 def simulate_op(R, c, op, op1_val, op2_imm, STEPS): return If(op==0, op1_val - op2_imm, If(op==1, op1_val ^ op2_imm, If(op==2, op1_val & op2_imm, 0))) # default op_to_str_tbl=["SUB", "XOR", "AND"] instructions=len(op_to_str_tbl) def print_model(m, STEPS, op, op2_imm): for s in range(1,STEPS): op_n=m[op[s]].as_long() op_s=op_to_str_tbl[op_n] print "%s EAX, 0x%x" % (op_s, m[op2_imm[s]].as_long()) def attempt(STEPS): print "attempt, STEPS=", STEPS sl=Solver() R=[[BitVec('S_s%d_c%d' % (s, c), BIT_WIDTH) for s in range(MAX_STEPS)] for c in range (CHAINS)] op=[Int('op_s%d' % s) for s in range(MAX_STEPS)] op2_imm=[BitVec('op2_imm_s%d' % s, BIT_WIDTH) for s in range(MAX_STEPS)] for s in range(1, STEPS): # for each step, instruction is in 0..2 range: sl.add(Or(op[s]==0, op[s]==1, op[s]==2)) # each 8-bit byte in operand must be in [0x21..0x7e] range: # or 0x20, if space character is tolerated... for shift_cnt in [0,8,16,24]: sl.add(And((op2_imm[s]>>shift_cnt)&0xff>=0x21),(op2_imm[s]>>shift_cnt)&0xff<=0x7e) """ # or use 0..9, a..z, A..Z: for shift_cnt in [0,8,16,24]: sl.add(Or( And((op2_imm[s]>>shift_cnt)&0xff>=ord('0'),(op2_imm[s]>>shift_cnt)&0xff<=ord('9')), And((op2_imm[s]>>shift_cnt)&0xff>=ord('a'),(op2_imm[s]>>shift_cnt)&0xff<=ord('z')), And((op2_imm[s]>>shift_cnt)&0xff>=ord('A'),(op2_imm[s]>>shift_cnt)&0xff<=ord('Z')))) """ # for all input random numbers, the result must be CONST: for c in range(CHAINS): sl.add(R[c][0]==random.randint(0,0x100000000)) sl.add(R[c][STEPS-1]==CONST) for s in range(1, STEPS): sl.add(R[c][s]==simulate_op(R,c, op[s], R[c][s-1], op2_imm[s], STEPS)) tmp=sl.check() if tmp==sat: print "sat!" m=sl.model() print_model(m, STEPS, op, op2_imm) exit(0) else: print tmp for s in range(2, MAX_STEPS): attempt(s)

This is reworked and simplified piece of code I've already used in my "SAT/SMT by example" (under "Program Synthesis" section).

What it can generate for zero:

AND EAX, 0x3e5a3e28 AND EAX, 0x40214040

These two instructions clears EAX. You can understand how it works if you'll see these operands in binary form:

0x3e5a3e28 = 111110010110100011111000101000 0x40214040 = 1000000001000010100000001000000

It's best to have a zero bit for both operands, but this is not always possible, because each of 4 bytes in 32-bit operand must be in [0x21..0x7e] range, so the Z3 solver find a way to reset other bits using second instruction.

Running it again:

AND EAX, 0x3c5e3621 AND EAX, 0x42214850

Operands are different, because SAT solver is probably initialized randomly.

Now 0x0badf00d:

AND EAX, 0x48273048 AND EAX, 0x31504325 XOR EAX, 0x61212251 SUB EAX, 0x55733244

First two AND instruction clears EAX, 3th and 4th makes 0x0badf00d value.

Now 0x12345678:

AND EAX, 0x41212230 XOR EAX, 0x292f2224 AND EAX, 0x365e4048 XOR EAX, 0x323a5678

Slightly different, but also correct.

For some constants, more instructions required:

CONST=0xf3c37766 ... AND EAX, 0x21283024 AND EAX, 0x58504050 SUB EAX, 0x31377b56 SUB EAX, 0x3f2f3b5e XOR EAX, 0x7c5a3e2a

Now what if, for aesthetical reasons maybe, we would limit all printable characters to 0..9, a..z, A..Z (comment/uncomment corresponding fragments of the source code)? This is not a problem at all.

However, if to limit to a..z, A..Z, it needs more instructions, but this is still correct (8 instructions to clear EAX register):

CONST=0x0 ... XOR EAX, 0x43685575 SUB EAX, 0x6c747a6f XOR EAX, 0x59525541 AND EAX, 0x65755454 XOR EAX, 0x57416643 AND EAX, 0x76767757 SUB EAX, 0x556f7547 AND EAX, 0x42424242

However, 7 instructions for 0x12345678 constant:

CONST=0x12345678 ... AND EAX, 0x6f77414d SUB EAX, 0x646b7845 AND EAX, 0x41674a54 SUB EAX, 0x47414744 AND EAX, 0x49486d41 XOR EAX, 0x53757778 AND EAX, 0x7274567a

Further work: use ForAll quantifier instead of randomly generated test inputs... also, we could try INC EAX/DEC EAX instructions.

Please drop me email about any bug(s) and suggestion(s):