This is the simplest possible intro to finite state machine.
Using IDA disassembler, I got listings such as:
... .text:10095A6B push dword ptr [eax+8] ; hPubKey .text:10095A6E push dword ptr [eax+1Ch] ; dwSigLen .text:10095A71 push dword ptr [eax+20h] ; pbSignature .text:10095A74 push dword ptr [eax+4] ; hHash .text:10095A77 call ds:CryptVerifySignatureW .text:10095A7D test eax, eax .text:10095A7F mov eax, [esi+14h] .text:10095A82 jz short loc_10095A8A .text:10095A84 mov byte ptr [esi+eax], 1 .text:10095A88 jmp short loc_10095A8E ...
I need to find all calls to the CryptVerifySignatureW() function and patch check after it (JZ instruction). In automated fashion.
I can use grep...
% grep -A 5 "call.*ds:CryptVerify" fname.lst ... .text:10095A77 call ds:CryptVerifySignatureW .text:10095A7D test eax, eax .text:10095A7F mov eax, [esi+14h] .text:10095A82 jz short loc_10095A8A .text:10095A84 mov byte ptr [esi+eax], 1 .text:10095A88 jmp short loc_10095A8E -- .text:10095B07 call ds:CryptVerifySignatureW .text:10095B0D test eax, eax .text:10095B0F mov eax, [esi+14h] .text:10095B12 jz short loc_10095B1A .text:10095B14 mov byte ptr [esi+eax], 1 .text:10095B18 jmp short loc_10095B1E -- .text:10095B97 call ds:CryptVerifySignatureW .text:10095B9D test eax, eax .text:10095B9F mov eax, [esi+14h] .text:10095BA2 jz short loc_10095BAA .text:10095BA4 mov byte ptr [esi+eax], 1 .text:10095BA8 jmp short loc_10095BAE
But I'm going to write an utility that spots four lines in a listing:
call ds:CryptVerifySignatureW test... mov... jz...
Then giving an address of the last instruction (JZ).
It's easy if you can load/map the whole file in memory. But what if this file is too big?
I came up with the following script:
#!/usr/bin/env python3
import sys, re
r=re.compile(r".text:([0-9A-F]{8}) *([a-z]+) *([^ ]+)")
fname_base=sys.argv[1]
# Using readline()
file1 = open(fname_base+".lst", 'r')
lines_seen=0
while True:
# Get next line from file
line = file1.readline()
# if line is empty
# end of file is reached
if not line:
break
tmp=r.search(line)
if tmp!=None:
#print (tmp.group(1), "|", tmp.group(2), "|", tmp.group(3))
adr=tmp.group(1)
ins=tmp.group(2)
rest=tmp.group(3)
if ins=="call" and "ds:CryptVerifySignatureW" in rest:
lines_seen=1
elif lines_seen==1 and ins=="test":
lines_seen=2
elif lines_seen==2 and ins=="mov":
lines_seen=3
elif lines_seen==3 and ins=="jz" and "short" in rest:
#print (adr)
print ("PE_patcher.exe "+fname_base+".exe 0x"+adr+" 9090")
lines_seen=0
else:
lines_seen=0 # reset
file1.close()
It takes the input from stdin.
In fact, this is the simplest possible FSM, with the lines_seen variable acting as a state tracker. It has a graph of states, but that graph is a simple line consisting of only 4 states.
In other words, if you want to find a line number for a 3 strings ("line1", "line2", "line3"), the following piece of code can be used:
...
if line=="line1":
lines_seen=1
elif lines_seen==1 and line=="line2":
lines_seen=2
elif lines_seen==2 and line=="line3":
print ("found!")
# print line number, etc
lines_seen=0 # reset
else:
lines_seen=0 # reset
...
More advanced examples of FSMs in my blog: 1, 2, 3.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.