## 16-May-2015: Tweaking LLVM Obfuscator + quick look into some of LLVM internals.

Code obfuscating is a good example of security through obscurity and is intended to make things harder for reverse engineer.

In past, I tried to patch Tiny C compiler to produce really messy code, which, as far as I known, its results can be easily simplified by Hex-Rays decompiler.

One of the much more known examples is LLVM Obfuscator, which is set of transformation passes for LLVM.

One of its transform passes is substitution of some primitive operation like AND or XOR to much more messy code.

For the sake of experiment, I tried to add support of one really strange XOR equivalent I accidentally found using Aha! superoptimizer. This is (((x & y) * -2) + (x + y)), which is, hard to believe, behaves just like XOR operation without actual XOR operation.

So here we go, let's open lib/Transforms/Obfuscation/Substitution.cpp file and add this:

// Implementation of a = (((x & y) * -2) + (x + y))
void Substitution::xorSubstitutionAha(BinaryOperator *bo) {
Type *ty = bo->getType();

// get operands
Value *x = bo->getOperand(0);
Value *y = bo->getOperand(1);

ConstantInt *co = (ConstantInt *)ConstantInt::get(ty, -2, true /* isSigned */); // -2

// first half of expression: (x & y) * -2
BinaryOperator *op1 = BinaryOperator::Create(Instruction::Mul,
BinaryOperator::Create(Instruction::And, y, x, "", bo),
co, // -2
"", bo);

// second half of expression: x + y
x,
y,
"", bo);
// sum up both halves
op1,
op2,
"", bo);
bo->replaceAllUsesWith(op);
}


The function takes expression in form x^y on input and returns its transformed version.

At the first 3 lines of the function, we get type of values and both of arguments.

We then construct -2 signed value, then construct both halves of expression and finally, join them at the end using ADD operation.

Resulting expression has all this stuff inside.

It may be hard to understand for newbies, but LISPers will grasp this at very first sight. Value C++ class is the object containing actual value inside plus some information about its type. ConstantInt class is probably the same. BinaryOperator C++ class, perhaps, has at least three members: operation and two links (pointers) to other classes, which could has type of Value or BinaryOperator. Thus, expression elements can be combined, nested, sliced, etc.

op1 can be graphically represented in this form:

op1 = +-----+
| MUL |
|     | ----> +-----+
|     | --+   | AND |         +-----------+
+-----+   |   |     | ------> | (Value) x |
|   |     | ---+    +-----------+
|   +-----+    |
|              |    +-----------+
|              +--> | (Value) y |
|                   +-----------+
|
+---> +------------------+
| (ConstantInt) -2 |
+------------------+


op2 is just:

op2 = +-----+
| AND |         +-----------+
|     | ------> | (Value) x |
|     | ---+    +-----------+
+-----+    |
|    +-----------+
+--> | (Value) y |
+-----------+


op is construction of ADD operation with two pointers to both halves we've just constructed:

op = +-----+
|     | ------------------> +-----+
|     | ---> +-----+        | MUL |
+-----+      | ADD |        |     | ----> +-----+                    +-----------+
|     | ---+   |     | --+   | AND |            +-----> | (Value) x | <-----+
|     | -+ |   +-----+   |   |     | -----------+   	  +-----------+       |
+-----+  | |             |   |     | --------------+	                      |
| |             |   +-----+               |	  +-----------+       |
| |             |                         +--> | (Value) y | <---------+
| |             +---> +------------------+	  +-----------+       |   |
| |                   | (ConstantInt) -2 |                         |   |
| |                   +------------------+                         |   |
| |                                                                |   |
| +----------------------------------------------------------------+   |
+----------------------------------------------------------------------+


All rectangular blocks has BinaryOperation type, except when it's noted explicitly. Arrows shows how objects has pointers to other objects.

Resulting expression is not stored in memory in this precise form, it's in fact a pointers to other parts, which were constructed some time ago. For example, both x and y values are used twice in our expression, but there is no need to duplicate them, let both places where x is used, be pointing to the single Value object holding x.

Other LLVM transform passes may build other expressions around this one, or they may slice it, reconstruct, rework, replace parts, etc. In fact, this is what compilers do most of the time.

As far as I know, learning LISP (or Scheme) is simplest possible way to get understanding of such things, which are really makes life of programmer easier. I strongly encourage every programmer to learn LISP (or any its variant) basics, even if he/she would never write a single line of LISP code.

On comparing LISP to LLVM: primitive LISP element can hold some value (called atom) or can have links to other elements (called cons cell), thus allowing to build huge trees of data and code. LISP is homoiconic language (surprisingly, assembly language is also homoiconic), it's when disctinction between data and code is blurred. Any code can be represented as data and otherwise. Aside from self-modifying code, this allows to build new functions on fly, using the very same operations and data primitives for both data and code.

Now back to o-LLVM. Aside from my tweak, there are two other XOR substitution functions. I disabled them temporarily and now I'll try to test what I did.

test.c:

#include <stdio.h>

int test_XOR(int a, int b)
{
return a^b;
};

int main()
{
printf ("%x\n", test_XOR (test_XOR (0x12345678, C), C));
};

$bin/clang test.c -c -o test -mllvm -sub$ objdump -d test

test2:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 :
0:	55                   	push   %rbp
1:	48 89 e5             	mov    %rsp,%rbp
4:	89 7d fc             	mov    %edi,-0x4(%rbp)
7:	89 75 f8             	mov    %esi,-0x8(%rbp)
a:	8b 75 fc             	mov    -0x4(%rbp),%esi
d:	8b 7d f8             	mov    -0x8(%rbp),%edi
10:	89 f8                	mov    %edi,%eax
12:	21 f0                	and    %esi,%eax
14:	69 c0 fe ff ff ff    	imul   $0xfffffffe,%eax,%eax 1a: 01 fe add %edi,%esi 1c: 01 f0 add %esi,%eax 1e: 5d pop %rbp 1f: c3 retq  Yeah, that works. I also compiled all the stuff in Cygwin. I had some problems with compiling, and here is my patch to enable this. Not sure if it's correct. Compiling (cygwin): bin/clang -mllvm -sub test.c -o test -mcmodel=large Interestingly, o-LLVM under Cygwin generates slightly different code: sub_1004010E0 proc near var_8 = dword ptr -8 var_4 = dword ptr -4 push rbp mov rbp, rsp push rax mov [rbp+var_4], ecx mov [rbp+var_8], edx mov ecx, [rbp+var_4] mov eax, edx and eax, ecx add eax, eax add ecx, edx sub ecx, eax mov eax, ecx add rsp, 8 pop rbp retn sub_1004010E0 endp  There is no imul instruction and there are no -2 constant. The resulting expression is reshuffled: y + x - 2 * (x & y), but it has the very same effect as original one. Perhaps, some additional LLVM optimization pass made things more optimized: of course, multiplication by -2 is slower than single add operation. OK, so I enabled two other XOR functions and now o-LLVM picking XOR functions randomly: #define NUMBER_XOR_SUBST 3 ... funcXor[0] = &Substitution::xorSubstitution; funcXor[1] = &Substitution::xorSubstitutionRand; funcXor[2] = &Substitution::xorSubstitutionAha; // our function ... case Instruction::Xor: (this->* funcXor[llvm::cryptoutils->get_range(NUMBER_XOR_SUBST)])(cast(inst));  ### Tests So far so good, but test? Tests are crucial, compiler tests are arch-crucial, obfuscator tests are arch2-crucial. Any single unnoticed typo may lead to nasty hard-to-find bugs. o-LLVM developers mentioned they use OpenSSL for tests (compile OpenSSL by o-LLVM and then run OpenSSL tests). So did I. Here is also my small OpenSSL fork and patch to enable o-LLVM obfuscator: https://github.com/yurichev/openssl/commit/56bdb20af29c5617733bf7b195844177ed043614. Now I run it. Correct your path to o-LLVM where clang binary is, if you want to run this yourself: export PATH=$PATH:~/src/llvm-obfuscator/build/bin
./Configure linux-x86_64-clang
make
make test


OpenSSL tests now works much slower. No wonder. Now it's a time for a slightly deranged mood: when I enable "sub" o-LLVM pass (Instructions Substitution), OpenSSL tests works, but with other obfuscations (fla (Control Flow Flattening) and bcg (Bogus Control Flow)) tests are not passed. Ouch, something wrong. Of course, I could do something wrong or introduce a bug somewhere. Nevertheless, judging by OpenSSL tests, our added XOR function is working fine.

Keep in mind, code obfuscation is not a real protection against motivated reverse engineer. On an untangling of o-LLVM obfuscated code, there is well known article about it: http://blog.quarkslab.com/deobfuscation-recovering-an-ollvm-protected-program.html

### Twine

LLVM is my favorite open-source C++ project for learning. Some time ago I found nice Twine class (and/or data structure) there.

Understanding how Twine operates may also help to understand how expressions are constructed in LLVM and LISP.

So. Many programmers know that text string concatenation task is somewhat costly if strings are growing (O(n) complexity class, speed is linearly depending on size of data). Imagine, we need to devise a string class (or library) which will work as fast as possible (memory trade-off accepted). One of solutions is data structure called Rope. Rope is very efficient in text editors. Imagine you write text editor. Probably, one thing first came to mind is to represent internal buffer as array of strings. OK, but what strings are long and user inserting characters somewhere in the middle? You expand string byte-by-byte at each keystroke. This implies heavy memory usage. On contrary, Rope is a tree where each node pointing to part of text. Loaded text file may be stored somewhere as a static text blob which will never be modified. Instead, when user press "Enter" in the middle of the string, new node in Rope is created, pointing to the first half of string and to the next. When user splice strings, two nodes in the Rope are connected in a way to indicate that these two strings are now spliced. It is not a problem to remove just one character in the middle of 10Gb text string or to insert it. Actual text buffer can be showed to the user on screen, but it can never be in the memory in this form. The whole text file will be (re)constructed only when user save file to disk. Oh, there is even Rope implementation in C++ STL from SGI: https://www.sgi.com/tech/stl/Rope.html.

Twine is simplified version of Rope. Each Twine object may represent a string or pointers to two other strings.

Now example from LLVM:

  // Emit the code (index) for the abbreviation.
if (isVerbose())
"] 0x" + Twine::utohexstr(Die.getOffset()) +
":0x" + Twine::utohexstr(Die.getSize()) + " " +
dwarf::TagString(Abbrev.getTag()));

( lib/CodeGen/AsmPrinter/AsmPrinterDwarf.cpp )

This piece of code is probably provides some debugging output. When writing such code in C++ using std::string, numbers are converted into strings and all strings spliced using operator+() method. But what is going here is string constructed using Twine, without actual string splicing and number-to-string conversion done. Twine objects are nested to each other in form of tree. Splicing is done only when there is a need to. Those who concern about speed of constructing debugging strings, may add global flags to set debugging level. And if current debugging level is lower than some value, debugging string will not be constructed at all. On contrary, Twine allows to construct all strings as fast as possible, but splice them only while output. So actual string concatenation is deferred.

Here we can see that Twine::operator+ is either constucts new Twine object using two strings or call .concat() method which connects second string (called Suffix there) to the current one (called Prefix in their lingo): http://llvm.org/docs/doxygen/html/Twine_8h_source.html#l00487.

So concatenation is always O(1) (algorithm speed is not depending on size of data and always takes the same time), no matter how many Twine objects are behind those two objects which we currently splicing. On the other hand, memory consumption is bigger than to store a simple C-string.

Actual splicing is done only when .str() method is called: http://llvm.org/docs/doxygen/html/Twine_8cpp_source.html#l00016

In fact, Twine behaves just like somewhat limited LISP cons cell: its elements are always strings/numbers or links to another Twine objects.

Twine can be implemented probably in any programming language. And it is misnomer to my taste. I would rather call it Knot or maybe StringKnot.

### Pattern matching and term rewriting systems

When expression is represented as a tree (which is, frankly speaking, always), some magic things are possible. LLVM has pattern matcher which operates on trees. It is like regular expressions in some sense, but much more cooler. They are used in LLVM extensively to simplify expressions.

For example:

// (X >> A) << A -> X
Value *X;
if (match(Op0, m_Exact(m_Shr(m_Value(X), m_Specific(Op1)))))
return X;

( lib/Analysis/InstructionSimplify.cpp )

In this example, match() sets X so it will point to the sub-expression and it will be returned as simplified expression.

Another cool example is detecting and replacing the whole expression into BSWAP instruction:

// (A | B) | C  and  A | (B | C)                  -> bswap if possible.
// (A >> B) | (C << D)  and  (A << B) | (B >> C)  -> bswap if possible.
if (match(Op0, m_Or(m_Value(), m_Value())) ||
match(Op1, m_Or(m_Value(), m_Value())) ||
(match(Op0, m_LogicalShift(m_Value(), m_Value())) &&
match(Op1, m_LogicalShift(m_Value(), m_Value())))) {
if (Instruction *BSwap = MatchBSwap(I))
return BSwap;

( lib/Transforms/InstCombine/InstCombineAndOrXor.cpp )

match() function internals is surprisingly simple.

I would say, this is because it can be viewed as term rewriting system (TRS) (which is also simple). The whole compiler (especially optimizer) can be viewed as TRS working on abstract syntax tree, finding something known to it and transforming it according to its rules. It does this as long as there is something left to "rewrite". Obfuscated XOR function which I showed in this article is just another rule for TRS: each time TRS seeing XOR somewhere in the tree, it is then replacing it to one of tree obfuscated XOR functions, picked randomly.

By the way, old-school soviet programmers of USSR era may recall that compilers was called "translators" in Russian-language computer science books. I would say this was a better name. Compiler doesn't compile (linker does), it rather translates from one language to another, according to a set of rules.

Oh, and by the way, decompiler can also be viewed as TRS.

### Another known LLVM obfuscator: Kryptonite

Here is another well-known LLVM obfuscator, Kryptonite.