At this stage in our series we know what malware is, we know how to use Dynamic and Static analysis tools, we know how malware tries to avoid detection and we have edited the obfuscation methods ourselves. The next step in our path to becoming malware analyst’s is to gain an understanding of Reverse Engineering. The first thing we must know for this is the basics of assembly and Intel x86. By reverse engineering malicious code we can delve deeper into the structure and behavior of suspect files and gain a greater understanding into the codes authors.
Generally a computer system can be represented as several layers of abstraction in order to allow cross layer integration. A good example of the reason for this is how Windows or Linux OS can run off many different types of hardware. Similarly malware authors tend to create the program in a high level language like C, C++, C# or Python, which is then compiled into machine code to be executed by the CPU. As analysts we generally dont have access to the source code, though we can certainly try to decompile the application we usually need to rely on our under standing of low level languages like Assembly to figure out how a program operates.
Sounds though right? Fortunately assembly isn’t as bad as you might think. Its estimated that 14 assembly instructions account for 90% of all code. With the top 5 instruction accounting for 64% of the total code. Here we can see that the number of assembly instructions we need to know is very accessible. For the curious, those 5 instructions are; MOV, PUSH, POP, CALL and CMP.
Malware is usually stored in binary, when we disassemble malware we take the binary is an input into our disassembled to output the assembly language code that we can review. This can be difficult as Assembly is a category of different languages depending on the processor in use. x86 Architecture and its associated language are the most common and what we will learn about here but others include; x64, SPARC, PowerPC, MIPS and ARM.
Most modern computer architectures including x86 use the Von Neumann architecture which has 3 components – the CPU that executes the code, fast and volatile Main Memory that stores all data and code that has been called and an I/O system that interfaces with hard drives, monitors and peripherals.
- The Control Unit get instructions from the ram using one of the CPU’s Register’s, which act as an instruction pointer to store the address of the instruction to execute.
- The registers act as basic data storage units and are very fast compared to RAM. It allows the CPU to fetch and store instructions faster.
- The Arithmetic Logic Unit executes the instructions and store the results.
Main memory can be divided into 4 main sections; DATA – which holds alues that are put in place when the program is initially loaded such as static values or global variables; CODE – which includes the instructions fetched by the CPU to be executed, this controls what the program does; HEAP which is used for dynamic memory allocation and elimination where the contents change frequently during execution; STACK – which is used for local variables, parameters and is used to control the program flow.
CISC vs RISC
CISC and RISC are two types of processor. Intel uses a software centric ISA called CISC, Complex Instruction Set Computer which has many special purpose instructions and a given compiler we may never use. We just need to know how to use the manual. It has variable length instructions between 1 and 15 bytes long. RISC ISA’s such as ARM on the other hand is Hardware Centric with more registers and fewer, fixed-size instructions.
Endianess comes in two flavours, Little Endian where bytes are stored with the little end first. This can be seen with the byte 0x12345678 which would be stored 0x78563412. Intel uses Little Endian. Big Endian on the other hand would stored 0x12345678 as is. This can be important to be aware of as malware changes from Big to little Endian during its life time as over the network Big Endian tends to be used and on the OS(Intel), little Endian is used.
Registers are small memory storage areas built into processors. They are faster than ram and volatile. We have 8 general purpose registers and an instruction pointer which points at the next instruction to execute. On x86-32, registers are 32 bits long and on x86-64, they are 64 bits long. While the registers are general purpose Intel has a suggested convention to follow for compiler developers and assembly coders. While this convention does not have to be used in general it is followed;
- EAX – Stores function return values
- EBX – Base pointer to the data section
- ECX – Counter for string and loop operations
- ESP – Stack pointer
- EBP – Stack frame base pointer
- EIP – Pointer to next instruction to execute (“instruction pointer”)
- Caller-save registers – eax, edx, ecx
- Callee-save registers – ebp, ebx, esi, edi
EFLAGS is a register that holds many single bit flags. There are two we need to be aware of; ZERO FLAG(ZF) – Set if the result of some instruction is zero, and SIGN FLAG (SF) – Set equal to the most significant bit of a result. There is a good rundown on flags here; https://en.wikipedia.org/wiki/FLAGS_register
So what instructions can we use?
First: The Stack
The stack is a conceptual area of main memory which is designated by the OS when a program is started. By general convention different OS start the stack at different addresses. Generally stacks follow a Last-In-First-Out (LIFO/FILO) data structure where data is pushed on to the top of the stack and popped off the top (we will talk more about these operations shortly). The stack is used normally for temporary storage space. By convention the stack grows towards lower memory addresses so that by adding something to the stack the top of the stack is now at a lower memory address. The ESP points to the top of the stack, which is the lowest address in use. Data that exists at addresses beyond the top of the stack are considered as being undefined. The stack keeps track of which functions were called before the current one, holds local variables and is frequently used to pass arguments to the next function to be called. We need to keep in mind what is happening on the stack in order to understand any programs operation.
NOP, or No Operation, indicates to registers and no values. It existance is to pad/align bytes, delay time or, as we discussed in Lesson 5 obfuscation and to confuse malware detectors. A one-byte NOP instruction is an alias mnemonic for “XCHG EAX, EAX” instruction.
Push is the simplest instruction that lets us add something to the stack. This can be a Word, Double/Dword or QuadWord, but usually a Dword. It can be an immediate value(a numeric constant), hte value in a register or a register segment. The push instruction automatically decrements the stack pointer ESP by 4.
To then remove a value from the stack we must use the POP instruction which takes the DWORD off the stack, puts it in a register and increments the ESP by 4.
The call procedures job is to transfer control to a different function in a way that control can later be resumed where it left off. This allows separate programmers to share code and develop libraries for use by many programs What this means is the value of the instruction pointer is pushed into the stack, which at that time points to the instruction following the CALL instruction. First it pushes the address of the next instruction onto the stack for use by the RET (which is discussed next) for when the procedure is done. Then it changes the EIP to the address in the instruction.
There are two forms of the RET function;
1. It pops the top of the stack into the EIP, which also increments the ESP. In this form it is written as “RET”
2. It pops the top of the stack and EIP and add a constant number of bytes to ESP. In this form it is written as “ret 0x8”, “ret 0x20” and so on.
The move instruction can move a register value to another register, a memory value to a register, a register value to memory, an immediate value to a register and an immediate value to memory. BUT it can never move a memory value to memory.
r/m32 Addressing Forms
Anywhere we see r/m32 it means the code could be taking a value from a register or a memory address. In Intel processors most of the time square brackets  tell us to treat the value within as a memory address, and fetch the value at the address.
LEA – Load Effectiveness Address
Frequently used with pointer arithmetic, and sometimes for arithmetic in general. It uses the r/m32 form but is more the exception to the rule that the square brackets  syntax means dereference. For example that in a piece of arithmetic the resulting value stored is the values address, and not the value itself. This can be useful when passing the address of an array element to a subroutine. It may also be a slightly sneaky way of doing more calculations than normal in one instruction This is where its confusing me, we will have to do some examples later to get clarification.
ADD and SUB
These commands do what you think, they add and subtract values. The destination operand can be r/m32 or a register and the source operand can be r/m32 or a register or an immediate. It evaluates the operation and sets flags as appropriate. Instructions modify OF, SF, ZF, AF, PF and CF flags.
With the basics of assembly and its instructions under our belt the outputs of many tools like IDA pro and Ghidra are more clear and easier to understand but we will need more time, and one more blog post, before we fully digest assembly but already the output of our malware lesson 5 the debugger output is much more clear.
Until next time,