Malware Analysis Lesson 4; Malware obfuscation techniques

Malware, which can be any type of malicious code[4][, and detectors or anti-virus tools are in a continual arms race. With malware developing ever more advanced and sophisticated obfuscation techniques and detectors researching more complex detection mechanisms to identify the malware. This arms race has been ongoing for decades and traditionally detectors have relied on large databases of known signatures [3], or hashes, of the malware. However, malware often uses a range of techniques to change this signature from one infection (or generation) to the next. This makes it more challenging for detectors and malware analysts to identify the behaviour of malware in a timely manner, or even to identify the code as being malicious at all. We are going to look at some obfuscation techniques and describe how detection can be carried out for those techniques. Some methods can be highly complex, such as the malware being interwoven into a targeted host file, while others are simpler like changing the packer used. In all cases it makes detection more time consuming and resource intensive to isolate and identify the malware signature. To compound the woes of signature-based detector’s, this method is not effective against new malware using unknown vulnerabilities (think zero days). This relentless development of new malware variants has made signature-based detection less effective. However, with the reduction in the effectiveness of signatures, behavioural, heuristic and sandbox-based detection have been developed. To understand how all this works we first need to understand how the different types of malware and how they obfuscate themselves.


4 Categories of malware

Due to the diverse methods malware uses to obfuscate itself, it is necessary to categorize them and there are four main types of obfuscated malware; Encrypted, Oligomorphic, Polymorphic and Metamorphic. Let’s go through these now.

Encrypted malware

There are two types of encrypted malware, malware using encryption and malware using packers. With encryption the malware uses encryption to conceal itself from detection. This type of malware is usually composed of the decryptor and its encrypted main body.[1] This method is effective for two reasons, firstly by encrypting the malicious code it executes the malware cannot identify the payloads signature; secondly by changing the encryption key it uses the signature of the encrypted code itself changes.[4] To ensure the malware remains obfuscated throughout multiple generations, and in order to avoid its encrypted signature from being static – and thus identifiable by signature based detectors; Every time the malware is run it can generate and uses a new encryption key to keep its signature unique. For best effect this new key should be generated in a random and unpredictable manner. However, the decryptor portion of the malware cannot be encrypted as it needs to be executed and it retains a static signature. Due to this detection methods that focus on the malware decryptor signature are usually successful. [1]

Packers[3]

Packers are usually legitimate tools to decrease the size of an application while it is stored or transported, like compressing documents but in a way that still lets the application be executed. Even small changes to the underlying application can drastically change the signature of the resulting packed executable. There are multiple packing applications and research on which packers are most effective for evading detectors. One example of this is “Jon Oberheide and his colleagues at the University of Michigan wrote PolyPack, a Web-based application that supports 10 packers and 10 malware detection engines (like virus total)”[3]. This research and similar applications can help malware authors identify which packer would be best for their malware to avoid detection.

One-way packed malware can be detected is by having a database of all possible signatures a packed malware can produce. This is very inefficient, and a better option is to use what is called “Entropy Analysis”[3] to identify the packed malware. This can detect packed files but cannot detect the packer used, which can cause difficulties for deeper analysis. PHAD, PE-Probe and MRC all use Entropy analysis. Without unpacking the file, it can be difficult to know if its malware or a legitimate application, especially as we need to identify the right packer to unpack the file. This scan be difficult, packers are commonly used to spread malware.

Oligomorphic and Polimorphic

Malware that can mutate their decryptor’s from one generation to the next have been designed to fix the shortcomings of purely encrypted malware. The first example of this was the oligomorphic malware which was able to change its decryptor. [1] However oligomorphic malware was initially very limited in the maximum number of decryptor versions it could produce, allowing the signatures of all possibilities to eventually be calculated. This catalogue of signatures allowed detectors to identify all variants of the malware.

Fig. 1 Oligomorphic malware

Polymorphic malware is an encryption method that mutates its static binary code. [3] It was developed to attempt to take the ideas of Oligomorphic malware and further improve them by being able to generate an incalculable number of potential decryptor variants so that no single signature sequence will match all possible variants of this malware. It achieves this by using several very cool obfuscation methods we will talk about later including dead code insertion, register reassignment, Code Transposition and Instruction Substitution [2]. Each time the code is run it mutates itself by using a different key. To make things even more challenging for malware analysts, there are many tools out there such as The Mutation engine that automates the process; allowing regular, non-obfuscated malware to be converted into polymorphic malware.

To detect these types of malware the detectors make use of tools like sandboxing. With sandboxing the detector executes the malware in a secure emulator. We then execute the malware and wait for its constant body (the payload) to be decrypted in RAM after execution and try to match a signature. [1] This works as the polymorphic engine does not significantly change the native opcode that runs in memory. [3] Another way to detect polymorphic malware is by using Neural Pattern Recognition, which has shown a high detection rate, based on a small sample set. [3]

Malware obfuscation is a fast-paced arms race that continuously results in more dangerous malware that is harder to detect. Malware authors attempt to counter sandboxed execution by creating malware that detects when it is running in a virtualised environment and not decrypt it payload. Other malware authors create malware that may wait for some event that does not usually occur when executed in a sandbox, before decrypting it payload. Detectors are improving all the time and are incorporating features to defeat this type of malware with advanced techniques. [1] The decrypted code is essentially the same in each case, thus RAM/memory-based signature detection is possible. Block hashing can also be effective in identifying memory-based remnants.

Metamorphic malware

With the previous class of malware, we discussed how the decryptor was changed with each generation of the malware to avoid detection. Metamorphic malware takes this approach and builds on it by incorporating multiple obfuscation techniques into its payload rather than, or as well as, its decryptor. This way it may not need to use encryption or packing and still can be difficult to detect due to its ever-changing signature. It can maintain its behaviour without ever needed to repeat the same set of native opcodes in memory. [3] It needs to be able to recognize, parse and mutate its own body whenever it propagates. [1]  

There are two types of metamorphic malware, open-world and close-world. Open-world, as shown in the Conficker Worm, leverages a command and control structure – with the malware connecting to its controlling master server to download updates and functionality after the initial infection. Closed-world malware from each generation to the next uses self-mutating code via a binary transformer which modifies the binary code itself to avoid detection. [3]. Win32/Apparition was the first example to demonstrate these techniques. [3] The methods used to achieve this level of obfuscation are discussed below.

fig 2. Metamorphic malware

Obfuscation techniques

Polymorphic and Metamorphic malware take advantage of several techniques to obfuscate their code. We are going to go through several methods now .

Garbage/Dead Code Insertion; Dead code insertion pads out the code in some way with garbage, to change the files signature. This garbage could be randomly generated strings; or it could be new instruction sets that don’t do anything, or just don’t change the malicious operation of the code. NOP or CLC instructions can be used to fill out the code no operation instructions. Using Push and Pop operations on registers is another way. These garbage insertions can be defeated by modern detectors which identify the garbage, such as operations that do nothing, and then deletes it from the code before analysing and comparing the malwares signature. [8]

Register Reassignment/Swapping; In assembly, all programs work from a limited set of instructions and have a limited set of memory space for storing and fetching values. These memory spaces are known as CPU Registers. The number of registers a CPU has can vary. i386, for example, has 4 main registers; EAX, EBX, ECX and EDX. Malware can take advantage of these multiple registers for obfuscation. By switching the registers called and used the malware can change its code, such as from EAX to EBX and vice versa, from generation to generation while keeping the behaviour the same. [5][1]

Changing flow control/Subroutine Reordering; By changing the order of the program’s subroutines malware can produce an exponential number of potential variations. This involves changing jumps in the assembly code and reordering the call sequence by adding subroutines.[3] By changing the order of these jumps, and the order in which different functions are called – combined with other obfuscation methods, such as dead code function insertions, we not only change the signature and make it difficult for automated detectors to identify the malware, but we also increase the challenge of identifying what the malware does through static analysis. Block hashing and heuristic analysis can be the best ways for detectors to identify malware of this type.

Code/Instruction Substitution; Malware, like all code, is made up of a sequence of functions. With most programming languages there are multiple functions that can carry out the same behaviour. In x86, for example, XOR can be replaced by SUB and MOV can be replaced with PUSH. [1] This change results in a new generation of malware, with its own signature that is difficult for detectors to pick up on, even when detecting the instruction set used by the malware. Heuristic and behavioural detection are best placed to identify malware using this form of obfuscation. [8]

Code Transposition; Code transposition is reordering the code in a way the does not impact functionality. This can be through shuffling the order of the instructions and then calling them when needed in the main body. with unconditional branching statements or jumps [1]. The original malware can still be recovered by removing those statements and jumps. This obfuscation, because the malware is so complex, can be difficult and time consuming to both create it, and to detect it. Block hashing is one way to detect this form of malware, where the detectors hash segments, or blocks, of the malicious code are hashed and then checked by an algorithm for similarities with known malware.

Code Integration/Insertion; This is one of the most difficult malware obfuscation techniques to both implement and to detection or analyse. it involves the malware inserting it code within a legitimate program. It does this by decompiling the target executables into manageable objects and inserting itself in between those objects and finally reassembling the entire executable. Once reassembled we see the new generation of the malware. This changes the target programs signature and makes the malware difficult to detect. The best way to detect this malware is by keeping a database of legitimate/white-listed applications and their corresponding baseline signature and treat any applications that deviate from this baseline as malicious. Block hashing and heuristics detection can also be used.

Fileless malware; A new trend in malware obfuscation that has come to the fore over the past 2 years is fileless malware. This obfuscation technique has the malware forgo having a copy of itself stored on the target machines HDD or SDD completely and lives entirely in the RAM. Detectors can have a hard time detecting the malicious function, especially if it combines some armouring techniques, such as relying on external events before acting maliciously, and even when it is detected it can be difficult for to analyse as once the machine is shut down the malware is gone. A live image of the ram is needed to analyse it.

Lets try to obfuscate some malware!

let’s put this theory into practice. We are taking the sample malware from Das Malwerk http://dasmalwerk.eu we have chosen Filename: 25786c51-414b-11e8-a472-80e65024849a.file as we will obfuscate. This malware has a hash of 36E79238CF645F38FA9CE671A850CC3E29338B65 with a detection rate of 50 / 63 engines picking it up.

Initial output from Virus Total

Here we can see 50 engines detect our malware, so we are going to now try a few ways to reduce the detection rate. From static analysis we could identify that this file is written in .NET. By using a .NET packer, Netshrink, we get the new hash of; 87755627F18616749F257524152B1C60F036C6EF when checking this hash in VirusTotal, success! It does not exist.

Virus Total doesnt have the files hash value

This is good, but next let’s upload the file to virus total. For the hash to be detected the hash must be in the VirusTotal hash database, by uploading the malware we can check

So just by changing the packer we can reduce the detection rate from 50 to 25! The detectors that did identify the malware we can see their comments like “Behaves Like”, “Heuristic”, and “Suspicious” this suggests that some form of dynamic analysis was used to identify the file as malicious. Let’s try now to play with the source code. We will decompile this .net application with dotPEEK. This gives us the source code in an exported visual basic file. Opening this in VB Studio we can see the complexity of the malware we selected. First we are going to add a function that will add two numbers, then recompile and get a hash, then compare results.

Decompiling the code with DotPeek

Opening the file in Visual studio then we see we cannot compile it again. DotPEEK seems to have decompiled it with errors such as “base.\u002Ector();” instead of “base.ctor();” 261 errors different errors to fix in all. With this fixed and compiling successfully we have a full understanding of what this malware – Orcus does. Complete with allowing partial Remote Code Execution, setting up FTP servers, allowing DDOS, stealing password and logging keystrokes, we must proceed with the utmost caution. Like a big game hunter about to take out his first sealion. Unfortunately after fixing these 261 errors we get an additional 400 errors, such as “The type or namespace name ‘Shared’ does not exist in the namespace ‘Orcus’ (are you missing an assembly reference?) -using Orcus.Shared.Communication;” which is beyond our understanding of computer programming. This could be the result of the decompile not catching all of the source code

Debugging and trying to obfuscate the decompiled code in Visual Studios

Instead of a decompiler lets try using a debugger to walk through the assembly and see if there are some changes we can make at that level. We can see the malware author has done extensive obfuscation already. We saw this when investigating the source code above where we found functions that did nothing. Here at assembly level we see dead code insertion via nop and padding at the base of the file;

NOP Dead code insertion in Orcus
End of file padding

One thing we could do for an easy demonstration is change the register from the padding at the end from EAX to EBX but let us try something more challenging. One thing I am worried about is damaging the code functionality so I am going to replace some of the NOP commands with push EAX; pop EAX which should serve the same function, this demonstrates dead code insertion and instruction substitution. This was done for multiple pairs of NOP commands found.

Replaced NOP with Push/Pop obfuscation

After this small change we have a hash of; 5AED9A880DB19E1EC35E8A63C09EEF45EC50A2C7 lets see if this, before packing it, makes a difference to out detection rate. As expected this file hash has nothing found on Virus Total. When uploading the file itself we get 42/70 detection rate. This is somewhat better than the initial 50/68 we got initially.

After our obfuscation

If we pack this malware after the changes we get a hash of 2AB951E7904EBBF355954C5501E6D5EE356120AF and this hash still has no matches on Virus Total. Interestingly when we upload this file to Virus Total we get 32/68 detections. Which is higher than our initial packed file. This could be due to the frequency of uploads we have done and the time the detectors have had to analyse our files.

As one final test lets try register swapping at the end of file padding to see if there is any difference. We will also change 1 registry used for an actual instruction. As this is a complex code and to avoid breaking it we will change the registry used at the beginning of the binary.

If we swap the EAX registers used to EBX we can then assess the results.

With this process complete and the resulting file dumped into an exe, we pack it with NetShrink again and have a hash of 5AED9A880DB19E1EC35E8A63C09EEF45EC50A2C7. The result here is unexpected with 42 detectors identifying the malware;

When we upload the file, itself we get the same result. Three conclusions that we could draw from this are; VirusTotal and its detectors are learning from our uploads each time we obfuscate the malware to become more accurate at detecting its malicious nature. The packer we used, NetShrink could be relatively obscure and the detectors had to spend time analysing it(in this case over a 2 week period). Finally, it could be the obfuscation methods we used towards the end, where we focused on changing small segments of the code in the debugger were insufficient to fool the detectors – in this case it is highly possible block hashing was used. While our obfuscation efforts gave us mixed results, we were able to go through several obfuscation

References and bibliography

[1] Ilsun You, Kangbin Yim. (2010) ‘Malware Obfuscation Techniques: A Brief Survey‘, BWCCA 2010

[2] Philip Okane, Sakir Sezer, Kieran Mclaughlin. (2011) ‘Obfuscation: The Hidden Malware‘, IEEE Security & Privacy

[3] Jian Li, Jun Xu, Ming Xu, HengLi Zhao, Ning Zheng. (2009) ‘Malware Obfuscation Measuring via Evolutionary Similarity (2009)’,First International Conference on Future Information Networks

[4] Lysne O. (2018) ‘Static Detection of Malware’, The Huawei and Snowden Questions. Simula Springer Briefs on Computing, vol 4. Springer

[5] Kristian Iliev (2017) ‘Top 6 Advanced Obfuscation Techniques Hiding Malware on Your Device’,https://sensorstechforum.com/advanced-obfuscation-techniques-malware/ (accessed: 13/03/2019)

[6] Dr. Amit Kumar Bindal, Navroop Kaur. (2016) ‘A complete dynamic malware analysis’, International Journal of Computer Applications

[7] Mario Luca Bernardi, Marta Cimitile, Francesco Mercaldo, Damiano Distante. (2016) ‘A constraint-driven approach for dynamic malware detection’, 14th Annual Conference on Privacy, Security and Trust

Figures 1, 2 & 3: Camouflage In Malware: From Encryption To Metamorphism (2012)         ; Babak Bashari Rad, Maslin Masrom, Suhaimi Ibrahim



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s