Simplest possible intro to finite state machines

This is the simplest possible intro to finite state machine.

Using IDA disassembler, I got listings such as:

...
.text:10095A6B                 push    dword ptr [eax+8] ; hPubKey
.text:10095A6E                 push    dword ptr [eax+1Ch] ; dwSigLen
.text:10095A71                 push    dword ptr [eax+20h] ; pbSignature
.text:10095A74                 push    dword ptr [eax+4] ; hHash
.text:10095A77                 call    ds:CryptVerifySignatureW
.text:10095A7D                 test    eax, eax
.text:10095A7F                 mov     eax, [esi+14h]
.text:10095A82                 jz      short loc_10095A8A
.text:10095A84                 mov     byte ptr [esi+eax], 1
.text:10095A88                 jmp     short loc_10095A8E
...

I need to find all calls to the CryptVerifySignatureW() function and patch check after it (JZ instruction). In automated fashion.

I can use grep...

% grep -A 5 "call.*ds:CryptVerify" fname.lst

...

.text:10095A77                 call    ds:CryptVerifySignatureW
.text:10095A7D                 test    eax, eax
.text:10095A7F                 mov     eax, [esi+14h]
.text:10095A82                 jz      short loc_10095A8A
.text:10095A84                 mov     byte ptr [esi+eax], 1
.text:10095A88                 jmp     short loc_10095A8E
--
.text:10095B07                 call    ds:CryptVerifySignatureW
.text:10095B0D                 test    eax, eax
.text:10095B0F                 mov     eax, [esi+14h]
.text:10095B12                 jz      short loc_10095B1A
.text:10095B14                 mov     byte ptr [esi+eax], 1
.text:10095B18                 jmp     short loc_10095B1E
--
.text:10095B97                 call    ds:CryptVerifySignatureW
.text:10095B9D                 test    eax, eax
.text:10095B9F                 mov     eax, [esi+14h]
.text:10095BA2                 jz      short loc_10095BAA
.text:10095BA4                 mov     byte ptr [esi+eax], 1
.text:10095BA8                 jmp     short loc_10095BAE

But I'm going to write an utility that spots four lines in a listing:

call ds:CryptVerifySignatureW
test...
mov...
jz...

Then giving an address of the last instruction (JZ).

It's easy if you can load/map the whole file in memory. But what if this file is too big?

I came up with the following script:

#!/usr/bin/env python3
import sys, re

r=re.compile(r".text:([0-9A-F]{8}) *([a-z]+) *([^ ]+)")

fname_base=sys.argv[1]

# Using readline()
file1 = open(fname_base+".lst", 'r')

lines_seen=0

while True:
    # Get next line from file
    line = file1.readline()

    # if line is empty
    # end of file is reached
    if not line:
        break
    tmp=r.search(line)
    if tmp!=None:
        #print (tmp.group(1), "|", tmp.group(2), "|", tmp.group(3))
        adr=tmp.group(1)
        ins=tmp.group(2)
        rest=tmp.group(3)
        if ins=="call" and "ds:CryptVerifySignatureW" in rest:
            lines_seen=1
        elif lines_seen==1 and ins=="test":
            lines_seen=2
        elif lines_seen==2 and ins=="mov":
            lines_seen=3
        elif lines_seen==3 and ins=="jz" and "short" in rest:
            #print (adr)
            print ("PE_patcher.exe "+fname_base+".exe 0x"+adr+" 9090")
            lines_seen=0
        else:
            lines_seen=0 # reset

file1.close()

It takes the input from stdin.

In fact, this is the simplest possible FSM, with the lines_seen variable acting as a state tracker. It has a graph of states, but that graph is a simple line consisting of only 4 states.

In other words, if you want to find a line number for a 3 strings ("line1", "line2", "line3"), the following piece of code can be used:

...
        if line=="line1":
            lines_seen=1
        elif lines_seen==1 and line=="line2":
            lines_seen=2
        elif lines_seen==2 and line=="line3":
            print ("found!")
            # print line number, etc
            lines_seen=0 # reset
        else:
            lines_seen=0 # reset
...

More advanced examples of FSMs in my blog: 1, 2, 3.


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.