This is the simplest possible intro to finite state machine.
Using IDA disassembler, I got listings such as:
... .text:10095A6B push dword ptr [eax+8] ; hPubKey .text:10095A6E push dword ptr [eax+1Ch] ; dwSigLen .text:10095A71 push dword ptr [eax+20h] ; pbSignature .text:10095A74 push dword ptr [eax+4] ; hHash .text:10095A77 call ds:CryptVerifySignatureW .text:10095A7D test eax, eax .text:10095A7F mov eax, [esi+14h] .text:10095A82 jz short loc_10095A8A .text:10095A84 mov byte ptr [esi+eax], 1 .text:10095A88 jmp short loc_10095A8E ...
I need to find all calls to the CryptVerifySignatureW() function and patch check after it (JZ instruction). In automated fashion.
I can use grep...
% grep -A 5 "call.*ds:CryptVerify" fname.lst ... .text:10095A77 call ds:CryptVerifySignatureW .text:10095A7D test eax, eax .text:10095A7F mov eax, [esi+14h] .text:10095A82 jz short loc_10095A8A .text:10095A84 mov byte ptr [esi+eax], 1 .text:10095A88 jmp short loc_10095A8E -- .text:10095B07 call ds:CryptVerifySignatureW .text:10095B0D test eax, eax .text:10095B0F mov eax, [esi+14h] .text:10095B12 jz short loc_10095B1A .text:10095B14 mov byte ptr [esi+eax], 1 .text:10095B18 jmp short loc_10095B1E -- .text:10095B97 call ds:CryptVerifySignatureW .text:10095B9D test eax, eax .text:10095B9F mov eax, [esi+14h] .text:10095BA2 jz short loc_10095BAA .text:10095BA4 mov byte ptr [esi+eax], 1 .text:10095BA8 jmp short loc_10095BAE
But I'm going to write an utility that spots four lines in a listing:
call ds:CryptVerifySignatureW test... mov... jz...
Then giving an address of the last instruction (JZ).
It's easy if you can load/map the whole file in memory. But what if this file is too big?
I came up with the following script:
#!/usr/bin/env python3 import sys, re r=re.compile(r".text:([0-9A-F]{8}) *([a-z]+) *([^ ]+)") fname_base=sys.argv[1] # Using readline() file1 = open(fname_base+".lst", 'r') lines_seen=0 while True: # Get next line from file line = file1.readline() # if line is empty # end of file is reached if not line: break tmp=r.search(line) if tmp!=None: #print (tmp.group(1), "|", tmp.group(2), "|", tmp.group(3)) adr=tmp.group(1) ins=tmp.group(2) rest=tmp.group(3) if ins=="call" and "ds:CryptVerifySignatureW" in rest: lines_seen=1 elif lines_seen==1 and ins=="test": lines_seen=2 elif lines_seen==2 and ins=="mov": lines_seen=3 elif lines_seen==3 and ins=="jz" and "short" in rest: #print (adr) print ("PE_patcher.exe "+fname_base+".exe 0x"+adr+" 9090") lines_seen=0 else: lines_seen=0 # reset file1.close()
It takes the input from stdin.
In fact, this is the simplest possible FSM, with the lines_seen variable acting as a state tracker. It has a graph of states, but that graph is a simple line consisting of only 4 states.
In other words, if you want to find a line number for a 3 strings ("line1", "line2", "line3"), the following piece of code can be used:
... if line=="line1": lines_seen=1 elif lines_seen==1 and line=="line2": lines_seen=2 elif lines_seen==2 and line=="line3": print ("found!") # print line number, etc lines_seen=0 # reset else: lines_seen=0 # reset ...
More advanced examples of FSMs in my blog: 1, 2, 3.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.