Last week we made good headway into assembly. This week I am going to go through variables, a few more assembly operations and finally start looking at code constructs; loops, and branch statements. One of the biggest challenges to reverse engineers is that it can be impossible to step through an executables disassembled files due to the sheer amount of assembly instructions we would need to read through. I used to work with a bank and often spoke with their CTI team. On one occasion they gave me advice on how to handle navigating the amount of assembly instructions is to keep in mind the overall picture, the high level understanding of what the code does by looking at the groups of instructions, rather that panicking and trying to figure out and trace what malicious action “mov eax, ebx” is actually part of unless it is needed.
Most malware is written in C or C++ and we can see the coding constructs like loops, if statements, arrays, goto statements, switch statements and more in the assembly code as well as in the high level code itself. This blog is going to look at some more assembly instructions but also how these standard constructs look in assembly.
We already spoke about some of the registers in a previous blog but we still have some more to review;
- ECX – Counter for string and loop operations
- EDX – I/O pointer
- ESI – Source pointer for stream/string operations
- EDI – Destination pointer for stream/string operations
- EAX (AX, AH, AL)
- EBX (BX, BH, BL)
- ECX (CX, CH, CL)
- EDX (DX, DH, DL)
Global vs local variables
Like with high level languages global variables can be accessed by any function in a program while local variables can be accessed only by the function where its defined. While the declaration of each is similar in C, in assembly they look completely different;
We can see the main difference is where the line “uint8_t global_1;” is called. But with assembly global variables are referenced by memory addresses while local variables are referenced by stack addresses.
Aside from the ADD and SUB operations we looked at the other arithmetic operations are;
- INC – increment a destination value eg INC EAX
- DEC – decrement a destination value eg. DEC EAX
- MUL – multiply EAX register by a value eg. MUL $VALUE. The result is stored as a 64 bit value across EDX and EAX.
- DIV – Divide 64 bits across EDX and EAX by value. The result is stored in EAX and the remainder is stored in EDX.
Logical operators consist of OR, AND, NOT and XOR can all be used in x86 architecture. These instructions operate similarly to the ADD and SUB instructions with the syntax XOR SRC,DEST with the result stored in the destination. The XOR instruction is frequently encountered in disassembly. For example XOR EAX,EAX is a quick way to set the EAX register to zero. This is done for optimization as the instruction requires less bytes and cpu cycle than MOV.
- AND – Destination operand can be r/m32 or a register. The source operand can be r/m32 or a register too, or even an immediate value(i.e. no source and destination as r/m32’s)
- OR – Destination operand can be r/m32 or a register. The source operand can be r/m32 or a register too, or even an immediate value(i.e. no source and destination as r/m32’s)
- XOR – Destination operand can be r/m32 or a register. The source operand can be r/m32 or a register too, or even an immediate value(i.e. no source and destination as r/m32’s)
- NOT – Ones complement Negation(remember that?). The sing source/destination operand can be r/m32.
In order to shift registers we use the SHR and SHL instructions; “SHR/SHL destination, count”. These instructions shift the bits in the destination to the right or left and the number of shifts is the “count” field. If bits are shifted beyond the destinations boundary they are first shifted in the CF Flag. Zero Bits are filled in during the shift. At the end of the shift instruction the CF flag contains the last bit shifted out of the destination operand. Shifting is often used in place of multiplication as an optimisation. Shifting is simplet and faster than multipication as you dont need to mess around with registers or moving data around. The way we use shifting in place of multiplication is “shl eax, 1” is the same as multiplying EAX by 2. To figure out what you are multiplying by remember CCNA subnetting; https://www.9tut.com/subnetting-tutorial/2
Rotation is similar to shifting only instead of the bits disappearing when they fall off the edge of the destination the bits reappear on the opposite side, like a conveyor belt. ROR allows us to Rotate Right, while ROL allows us to rotate left.
Some of the items we discussed; Shifting, Rotation, and XOR/OR/AND are all encountered by analysts when we encounter encryption or compression. They will often look random and be in repeated a large number of times. Its one of the reasons we try and gain and overview of what the code does rather than investigating individual functions. When we do find and encrypted function we make a note of this and move on.
Branch statement, like if-else, are conditionally executed depending on the flow of the program. The most popular way of seeing this in assembly is through jump or JMP instructions. The format is “jmp location” and causes the next instruction executed to be the one specified by the jump. This is known as unconditional jumping as the execution will always execute such as with procedure calls, GoTo statements, exceptions and interrupts.
An always on jmp doesnt always fulfil our needs however. If-else isnt possible with jmp. We need some way to add conditions and this comes to us through conditional jumps usings flags to decide when to jump. There are more than 30 different jumps that can be used;
Before we can do a conditional jump we need to set the condition flags first. Typically this is done with CMP, TEST or whatever we have the sets flags.
CMP Compares two operands by subtracting the second operand from the first. It differers from the SUB operand in that the result is not stored. CMP computes the result, sets the flag then discards the result. This way it only sets the flag without impacting registers.
Like CMP, TEST sets flags and discards the results. It computes the AND of value 1 and value 2, then sets the SF, ZF and PF flags according to the result.
If statements alter the programs execution based on a set condition (ie if (1=1)). Most languages have these but we will see how basics and nested if statements look in assembly. It is good to know that all if statements need a conditional jump, but not all conditional jumps are if statements. We can see an example of an if statement here;
The If statement itself is seen in the “Mov[move], Cmp[compare], jne[conditional jump if ZF flag is 0/FALSE] and jmp to L2 to skip the else execution.
For Nested if statements to code is the same as the above only additional if statements have been included within the initial if statements. This should be understandable if you do any coding, if not play with python! 😀 Make a game, its great fun! But in assembly the code looks more complicated and difficult to follow.
We can see there are 3 conditional checks in this “x==y”, “z==0” and “X=!y”. We can see reading through the code it can get complicated fast, and this is before we encountered a malware authors intentional obfuscation, htis is why its important to focus on the overall flow of what the program is doing rather that identifying what is happening at each step.
Loops, like the for loop, are ways of repeating the same piece of code according to some parameters. In assembly this is achieved through the use of conditional jmps such as JGE. Having trouble finding a For loop example online but the basic principle would be similar to;
CLR a // clear register and start at 0
~some action~ // carry out whatever action we want
INC a // increment a
cjne a, b, $jump-address // compares the first two operands and branches to $jumpaddress if their values are not equal, giving us our loop.
Its interesting that in assembly we seem to be looking for an exact match. In the above example if a is > than b.. what would happen? Must remember to test this later. Its probably the case that the next relevent instruction in the code is executed(ie the instruction after the for loop).
While loops are similar but have the condition set at the beginning of the loop and in order to execute the loop this condition must be true. To avoid an infinite loop occuring we must make sure there is some change to the condition(such as an increment of the condition value) within the loop. Malware authors tend to use while loops to monitor for some action before executing malicious code, such as recieving a connection from a C2 server. This allows the malware to continuously listen for this.
We can see in this sample how the jmp at the end keeps bring the code back to the cmp instructions at the start. Once x is greater than or equal to 10 the jge instruction kicks in the let us skip the loop and go to the xor at the end.
I need a break after all this assembly. Theres alot of information and ive found going through the actual code, and code samples to be the best way to figure out what is happening. I think there will be one more lesson in Malware and then we will take a random piece of malware that has not been analysed yet and start putting it all together to come out with an awesome analysis report. 🙂