Malware Analysis Lesson 3; Static Analysis

During the course of our careers we will come across artifacts left behind by malicious hackers the are easily decipherable. These may be scripts, text files or something else that’s easy to open and view. But more often we, as malware analysts come across files that are not easily identifiable as malicious. Take executable files for example. These may be malicious or benign so we tend to take extra precautions with them. To figure out if an executable is safe or not it tends to help if we understand the structure b of executable files – and in turn gain an understanding of how these files affect our systems.

How do we do this? With static analysis. We are going to focus on PE (Portable Executable) file structure for this class. PE files are executables usually found on windows. The reason we take the time to perform this analysis is to identify them as benign – and thus avoid the costly steps of an investigation; or to identify them as malicious to be marked for evidence. If it is malware we can then classify it if it is a new malware or variant strain of an existing malware. This gains us some uber street cred.

How do we get files to analyse?

If we think there is malware running on our system we can get a dump of our ram and then parse it with tools like Volatility which is a neat open source tool. This would give us some cool information on the processes running and files open, allowing is to gain an understanding of what is going on. It is particularly useful for attacks where the hard disk may not be accessible, such as during a ransomware attack. This information allows us to locate the suspcious file and from there we can upload it the VirusTotal or a similar site to see if we get any hits. If virus total gives us ambiguous results we get to put our sherlock holmes hat on and investigate further.

The first step in malware analysis after we have our suspect file is to carry out a static analysis. We discussed this in our last talk and what it allows us to do is gather information without actively running the malware(actively running = dynamic analysis). So lets take this step by step.

Opening the malware to read the clear text

We can do this in several ways, using notepad, a hex editor or an application like Strings. What it allows us to do is to read any cleartext in the executable. There will be alot of unrenderable garbage but buried beneath that we may begin to form an understanding of what the code is doing. We might be able to identify error messages, help pages, function calls or similar.

1. Take a hash of the file.

We first hash the file so we can easily identify it and check for changes later. There are lots of hashing algorithms we can use and it doesn’t really matter which one, MD5 or SHA are the most common. With the has we can check it with antivirus scanners quickly for matches, or use something like the NIST databases. This helps us identify if the malware has been investigated before and what the findings were. Some places we can compare are;

  • NIST National Software Reference Library
  • Team Cymru Hash Registry

Apart from helping us identify known malware these can also help us identify if the file is a legitimate software or operating system file. By doing this we reduce the number of files that require a review.

In addition to the traditional methods of hash comparison above there have been several new types of hashing that may be more useful. Similarity hashing, fuzzy hashing, piecewise(or block) hashing and similarity digests are all tools we can use and there are a number of algorithms that support them, such as SDHASH, SSDEEP and MINHash.

Similarity Hashes compare files for similarity(shocking i know). They calculate the hash of portions of the code, such as functions or blocks, to identify similarities such as the same functions. Because its calculated based on a block of the binary if one piece of the code changes the parts of the code that are still similar will compare with the original. This gives us a comparison score. THis helps us identify modified files, increases the difficulty of malicious actors obfuscating their code and it speeds up our analysis time by removing a lot of manual analysis. The NSRL has hashes and similarity hashes.

2. Check the suspicious file with VirusTotal

Before we start with a full blown analysis a good, quick check we can carry out is uploading the file to one of the many virus scanning sites like Virus Total, Meta-Defender, Virscan, Jottie and others. Generally you can upload the file itself or just check the hash. This can be great to identify if the file is a known piece of malware but as you will see if you try this yourself it can be hit or miss, especially if the file is a malware variant that hasn’t been identified and classified before. We discussed in lesson one about how malware variants are being released daily and even reputable antivirus companies are not picking them all up. If we are lucky we can collect some information on the file, what it does and what malware family it can belong to – which will aide us later.

3. Investigating the inside of the file!

Strings.exe is a pretty cool too and can be downloaded from here. It goes through the file and extracts the contents and tries to print out any ASCII or Unicode values it identifies into a file you specify(or in the cmd prompt if you didnt specify a file). For malware that is not heavily obfuscated or encrypted this can be a useful tool for identifying the nature or purpose of the file.

The information we get can tell us about network activity, file activity and registry activity – along with providing us the function call names the file uses. This can help us narrow down if the file is trying to copy our clipboard, set up a server, gain persistence and more. This makes it a great tool to use to start by looking for odd or unique strings.

If the file we are looking at is a PE(from lesson two) there are two good tools we are going to look at in detail; PEDump and PEView. PEDump can be used to extract detailed information from the header of a PE file; this information can be shown on the cmd prompt or stored to a file. PEView is a GUI(yuck) tool that allows us to view what makes up the PE’s headers and sections.

PEDump: http://www.wheaty.net/downloads.htm

PEView: http://wjradburn.com/software/

PE File Analysis

The are aseveral components to a PE file that we should be familiar with before going further;

MS-DOS Header is the ever static “This program cannot be run in DOS mode” that is kept around for legacy reasons to prevent you running the program in incompatible DOS operating systems. If this is omitted the OS would fail to attempt to load the file on legacy machines. This header occupies the first 64 bytes of the PE file.

For executable files on the windows systems the MS-dos header is also called IMAGE_DOS_HEADER.

The file signature(or Magic Number) consists of the letters MZ, or 0x5a4d in little-endian hexadecimal format (Little endian machine: Stores data little-end first. When looking at multiple bytes, the first byte is smallest.), and is always the first two bytes of a file. Wanna know what MZ stands for? Its the initials of the guy that designed it!

e_lfanew is the last 4-byte value that points to the location of the PE header. This is important as after the header is the stub program that is run by ms-dos when the executable is loaded. This checks the OS compatibility with the file.

Next comes the PE File Signature which should be identifiable by a value of 0xe8 or PE.

The PE File Header/IMAGE_FILE_HEADER is contained in the 20 bytes following the PE signature and includes several things we will find useful;

The can be very important for forensics investigating as it represents the time that the image was created by the linker. The value is represented as the number of seconds since the start of January 1, 1970 in Universal Coordinated Time. This gives us the time on the coders computer from when they compiled the executable, and may be a clue as to when the program was created. But as with all computer evidence; this could have been modified at some point.

Machine show the architecture the program is designed to run on, such as x86 or x64.

The Characteristics flag gives us some more clues as to what the files does. For example characteristics can include IMAGE_FILE_EXECUTABLE_IMAGE(i.e. that this is an executable), IMAGE_FILE_DLL (i.e. that it is a DLL) and IMAGE_FILE_SYSTEM(for system files). But there are many others.

IMAGE_OPTIONAL_HEADER we get the chance to get additional information from the file such as the magic number – information about the structure of the file to enable the OS to execute the file. Including;

  • SizeOfCode: The total size of all code in the file.
  • AddressOfEntryPoint: The address where the loader begins execution of the file.
  • MinorOperatingSystemVersion: Minimum OS Version needed to run.
  • Checksum: The hash.

Sections

After this we have the actual sections of the file itself, which is what we are interested in. The most common and interesting sections of the PE file are;

  • .text: This section contains the instructions that the CPU executes. All other sections just store the data and supporting information. This should be the only section that can execute and the only section containing code.
  • .rdata: This section contains the import and export information we can see in PEView. Sometimes this is split into .edata (export) and .idata (import).
  • .data: The .data section contains global variables and data which is to be accessed from anywhere in the program. Local data is NOT stored here.
  • .rsrc: Contains resources used like icons, images, menus and strings.

Section headers are called IMAGE_SECTION_HEADER and give us information about the section structures including size and characteristics of each section. Within these sections are IMAGE_DATA_DIRECTORY structures that act as a directory for import/export tables, resource directories etc.

These sections hold information on the code and data for the applications. Libraries, DLL’s, API’s and any systems calls made are stored here.

Linking

Programmers can link imports so they dont need to re-impliment certain functionality in their code and between multiple programs. As analysts we can gather a lot of information on what a program does based on the functions it imports. There are 3 types of linking;

Static: Used in linux/unix, all code is in the executable. Makes the executable big. It’s difficult to differentiate between statically linked code and the executable’s own code

Runtime: Executables connect to libraries only when that function is needed. The linked functions do not have to be declared in the executable file header. This means the program can access any function in any library on the system and we can only know if we execute the malware for dynamic analysis.

Dynamic: This is the most common method of linking. With dynamic linking the host OS searches for the linked libraries when the program is loaded. When the program calls the linked library function, that function executes within the library. The PE file header stores information about every library that will be loaded and every function that will be used by the program. We can make calculated assumptions about what the program does based on these functions.

Summary

Knowing how PE files are structured and built can aide us in identifying the purpose of the application. There are a variety of tools used to help with this, like PEDump (which lets us locate the import data directory and parse the structures to determine the DLLs and the functions the application uses.), but it is ultimately a very manual process. There are countless functions and system calls that a file can import. MSDN have a great reference facility to find out what each function does.

Malware Analysis – Lesson 2; Types of malware

It is important for any malware analyst to understand the different categories of malware and how they try to infect our systems in order to better check for indicators of compromise. Malware is categorized and sub categorized based on its behavior, purpose and infection vector. Even beyond this many malware have spawned slight variants, such as Zeus, further providing a need for categorization.

By finding the commonalities between different malware we are able to more easily find malware indicators of compromise that allows us more efficiently identify and isolate malware strains.

Malware Stats

Before we continue it is interesting to note that while total malware is being churned out at an exponential rate, new malware appearing every year has mostly remained static. The primary reason for this is to create new, good malware there is a high level of technical skill required. Many of the threat actors we encounter would not meet this skill level and so rely on purchasing malware or changing existing malware into a new variant.

This is very easy to do and can be as simple as adding padding, changing the portions of the malware that is encrypted, moving functions around or even changing the functions themselves. Each of these steps(and theres more that can be done!) act as a way of tricking antivirus’ into believing the application is different even though the purpose and result of execution is the same. This explosion of variants can easily be seen with the banking Trojan Zeus, which has over 100,000 variants and more appearing every day.

AV-Test has some great statistics on this: https://www.av-test.org/en/statistics/malware/

Malware Classification

Classifying malware by behavior helps us gain an understanding of what the malware’s infection vector is, what its purpose is, how big of a risk it is and how we can defend against it. Knowing these things, and being able to quickly classify malware in this way allows us to more quickly respond. We need to know what malware has done, how the asset was compromised and when to restore it to a known good state.

Later in this post we are going to go through the classifications but until then Aman Hardikar has a nice mind map that gives a good visualization of how classification may be done. His blog may be found here.

The Computer Antivirus Research Organization Naming Scheme

CARO is a naming convention for malware varients that makes new malware names easy to understand, informative and standardized. It was created primarily for AV companies and, while not universally adopted it is used with variations among many vendors.

The naming convention is split as follows:

Type – the classification of malware (eg Trojan)

Platform – The operating system targeted

Family – the broader group of malware the variant belongs to.

Variant Letter – an alpha identifier (aaa, aab, aac etc)

Any additional information – for example if it is part of a modular threat.

If we put these components together we get an informative name. I have included a Microsoft image to help illustrate this below and the MS article can be found here.

MAEC

MAEC is a community developed structure language for encoding and sharing information about malware. It contains information on a malware’s behavior, artifacts or IoC’s, and relationships between malware samples were relevant. Those relationships give MAEC and advantage over relying on simple signatures and hashes as it generates these relationships based on low level functions, code, API calls and more. It can be used to describe malicious behaviors found or observed in malware at a higher level, indicating the vulnerabilities it exploits, behavior like email address harvesting from contact lists or disabling of a security service.

MISP

Where MAEC is a structured language like XML, MISP is an open source system for the management and sharing of IoCs. Its primary features include a centralized searchable data repository, a flexible sharing mechanism based on defined trust groups and semi-anonymized discussion boards.

An IoC is any indicator caused by a malicious action, intrusion or similar. It can be network calls, processes running, registry key creation and more. By having a central database of IOCs we are able to leverage the experience gained from cyber attacks across the world.

ENISA has a pretty cool report that include some info on MAEC and MISP. https://www.enisa.europa.eu/publications/standards-and-tools-for-exchange-and-processing-of-actionable-information/at_download/fullReport

Malware Types

Gollum’s current opinion of malware

viruses

One of the oldest, and definitely most well known malware are Viruses. They can be identified by the way they copy, or inject. themselves into other programs and files. This allows them to persist on a machine even if the original file is deleted and spread to other devices as the “host” files or programs are distributed. There are a few types of viruses that have been identified, and this feeds into our classification taxonomy discussed earlier. Symantec have a pretty good blog on these which can be found here. There are two attributes specifically that i will talk about here;

File Injectors are how viruses spread and the way they infect a host file. There are 3 types of this. Overwriting Infectors overwrite the host file as needed. Companion Infectors rename themselves as the target file. Parasitic Infectors attach themselves to a host file.

Memory resident virus are discussed under the Boot Sector and Master Boot Record virus sections on Symantec guide and discussed in detail by trend micro here. Memory resident infectors remain in a computers RAM after it has been executed to try to infect target files, programs or media (Like floppy drives! if they still exist… surely some bank somewhere uses them 🙂 ). The way the actually infect that target is the same as the File Injector method.

Close up; Macro Viruses

Macro viruses are less of an issue these days as macros are disabled by default in Microsoft word, and enabling them gives the user a pop-up warning them that the macro is attempting to run. In the past these were a major issue however so lets give a brief run down of them.

Macros a small scripts written in the language of the application it is run on, like Microsoft Word, Excel or Visual Basic. This OS independence means a macro run on a windows system will also run on a Mac OSX. Once executed the macro can run a number of functions, from infecting every document of that type, to changing the document contents or even deleting the contents. Generally Macros are spread via spam emails with “invoices” attached. Given they are still prevalent in your spam folder, for the budding Malware analyst these type of macros can be a good opportunity to analyse what a macro is doing. Just be sure to use a secure environment!

Worms

Worms are malware that replicates itself with little to know interacting by the user. How WannaCry spread throughout the world with the SMBv1 vulnerability EternalBlue is a recent example of this. Other types of worms can use browsers, email and IM’s to spread.

Close up; Mass-mailers

Mass Mailers are the traditional worm type malware. This is spread via tantalizingly designed emails that tempt you to click on them. That invoice you forgot to pay, the secret crush who loves you and more are all examples of how this type of malware encourages you to open it. Its a form of social engineering that fools you into clicking the link in the email or downloading the attachment. Some advanced mass mailers can even turn your computer into an SMTP server, to spread to other hosts; by compromising you address book they can email your contacts.

Other types

Other types of Worms include File Sharing worms, which rely on users downloading and running the applications, commonly seen in torrents. Who can resist that randomly uploaded movie on the pirate bay? You have been dying to see it, and anyway who has the cash to pay for it?

Internet worms would be WannaCry. These worms use vulnerabilities to spread across networks.

Instant Messaging worms used to be common and would take over the old IM clients (remember MSN Messenger?) and message all your contacts trying to get them to download the worm themselves.

Trojans

Trojan malware comes from the old Greek epic The Iliad (and the Brad Pitt movie Troy), in that myth that Greece was laying siege to a city, Troy. After a long stalemate they had soldiers hid inside a giant, hollow wooden horse that they pretended was a gift and tricked the defending Trojans into bringing the horse inside to one of the temples. When night fell the soldiers snuck out and opened the gates to the city! I recommend you read the poem itself as its really cool! 🙂

Trojan malware is malware that pretends to be a legitimate application but does something malicious. They do not replicate and tend to have a purpose that benefits from it evading detection, like operating as a backdoor. In many cases the Trojan program’s legitimate “cover” is fully functioning so that the victim will not remove it.

Close up; Bankers

Banker Trojans are designed to steal sensitive user information, like credit card details, credentials and other high value data. We spoke about Zeus before and this is an example of a banker Trojan. It acquires the data and then forwards it on to a Command and Control server, which receives and stores the data to be accessed by the malware author. I wonder if this outbound traffic could be used to detect it.

Close up; Keyloggers

Keyloggers continuously monitor and record keystroke. Usually storing them in a file or exfiltrating them to a command and control server, like with the Bankers. In some cases the keylogger with try to identify specific information by monitoring for “trigger” events; like visiting specific websites to try and capture the credentials. This logging behavior is also seen in bankers.

Close up; Backdoors

Backdoors are an great persistence tool for an attacker. The Trojan operating as a legitimate application opens a port on the server and listens for a connection. The attacker can then connect to your asset through the Trojan. In less targeted attacks the malware may compromise your system and setup a backdoor for the attacker, or their command and control server, to send commands. This outcome of this can be harvesting sensitive data, using your asset as a pivot point to traverse the network or using your asset as a “Zombie” in a botnet.

Rootkits

Rootkits are more scary than what we have talked about so far. Rootkits are not necessarily malware in and of itself, but is a collection of techniques and tools coded into malware allowing for privilege escalation. The aim of the rootkit is to fully compromise the system, conceal its presence and offer persistence. The escalated privilege can be gained by direct attack, using previously acquired log in credentials among other methods. These are difficult to find due to the elevated access it has.

Scareware

Scareware is any malicious application that uses social engineering to scare a user into buying unwanted software. Be giving a sense of urgency (“BUY NOW BEFORE ITS GONE!!!”) and intimidation (“YOUR LAPTOP HAS BEEN HACKED, BUY THIS TO FIX IT!”) the victim may pay the demand. This can come in the form of dodgy antivirus’s but can also be seen in other threats, the most prominent being ransomware. The Scareware portion of ransomware tends to be the countdown timer before “Your files are gone forever”

Adware

Adware can take many forms but the purpose is the same, to show you lots of advertisements. In the past the result of this was to see brightly colored and invasive banner advertisements and pop-up advertisements during regular browsing. As this would generally cause frustration from users(and subsequent adware removals) modern adware tends to try to be more subtle for persistence. Even more they will monitor what you do and create user profiles to then sell to third parties. Scary!

Spam

Spam is an incredibly common function of malware. Brian Krebs  wrote a great book on investigating primarily Russian spammers and found it was a multi-million euro business. In 2014 it was estimated over 90% of emails are spam. That is trillions of emails per year dedicate to this wasteful activity. Spam primarily uses email but can also use blogs, IM’s, advertisements and SMS. Its free for the spam artist and the occasional successes mean its unlikely to subside anytime soon. Beyond the usual risks this also concerns companies as they are responsible for protecting their employees from the kind of abuse spam can entail.

Infection vectors

This portion of our lesson is going to discuss how malware attempts to infect a system. There is the standard technical aspect of how the infection vector enables the malware to infect a system that we need to note, put the vector can also be social engineering. The ways malware authors leverage social engineering to get people to install malware was discussed already but here they are again;

  • Coming from a trusted source, like a worm from a friend saying I love you.
  • Having a sense of urgency or importance, such as having to “INSTALL THIS NOW BEFORE ITS TOO LATE”
  • Arousing interest of the victim, like the friend saying i love you but also offering an interesting service the victim has need of.

Email

Email far exceeds any other vector in terms of speed and coverage by which the malware can spread. Anyone with an email account can be a target of this and this makes extensive use of the social engineering techniques mentioned previously. ILOVEYOU was one of the earliest examples of this attack but macros and viruses embedded in documents distributed by spam campaigns are all still common.

Social networking

Social networking is a great platform for attackers to use due to its extensive reach. Attackers use social networks as a way to enter the lives of victims and provide them with links to malicious sites or malicious files to download. The attacker can also add friends of friends to extend its reach. Generally people accept friend requests if there are mutual connections more often than if there are no mutual friends. Making use of email and password lists that are readily available they can gain access to compromised existing accounts or just create their own.

Setting up their own pages can also be used to spread malware. When a user likes this page they receive updates directly to their news-feed, eliminating the need for a friend request.

Portable Media

Many organisations now a days block the USB ports on their assets and this is for a good reason. Having malware stored on a USB that automatically runs is a real risk facing us today and it can have big consequences. A recent example of this is the 2014 STUXNET attack, where the Israeli intelligence services MOSSAD left USB keys outside of an Iranian Nuclear plant. These USB keys had malware on them and when an unwitting scientist plugged one of them into his workstation the Malware slowly worked its way through the system until it found the Centrifuge SCADA systems. It then executed its main function of causing extensive damage. A lot has been written on this attack and it makes for interesting reading; https://en.wikipedia.org/wiki/Stuxnet

While STUXNET was a highly targeted attack, portable media can also be used for opportunistic attacks.

URL Links

URL links are a special kind of infection vector as they are usually spread through other infection vectors e.g. social networks, IMs, email etc. Examples of this vector can include link-shortening services and misspelled legitimate domain names. These URLs lead to fake websites that look legitimate (or could even have XSS scripts in the case of URL shorteners). This tricks the user into carrying out their tasks as normal not knowing that the attacker is recording their interaction with the fake website. This includes collecting any credentials used. Many banks have started enforcing Multi-Factor Authentication to mitigate the risks from this(as well as other attacks like Phishing).

File Sharing

The pirate bay and other file sharing websites have long been host to many types of malware. Users tend not to investigate what they download opening themselves up to compromise.

Soundwaves

A good blog on how this vector is used can be found here: https://blog.systoolsgroup.com/malware-through-soundwaves/

Bluetooth

Think BlueBourne; A good blog on how this vector is used can be found here: https://www.extremetech.com/mobile/255752-new-blueborne-bluetooth-malware-affects-billions-devices-requires-no-pairing

Malware Analysis – Lesson 1; an Introduction

I discussed in our last post how I have returned to college and will be digitizing my notes, as I make them. This is the first post in that series. It will cover a few areas of malware analysis on a high level, focusing on definitions and a few descriptive lines on each. The areas covered will be discussed in greater detail in future blogs.

What is Malware Analysis?

Malware Analysis is an extremely interesting area of cyber security where we take a piece of malware or malicious code and put it under a microscope. We examine the code with an aim to understand it; How did it infect our system, how does it spread, what does it do and what does it aim to do (what is its intent)? The information we collect from this investigation can be used to stop the malware spreading, and even help us improve our security to prevent a similar infection in future.

WannaCry is a great example of this. Marcus Hutchins[1] famously identified the malwares kill switch using dynamic analysis – by monitoring the malwares network connections and traffic he saw multiple queries being made to an unregistered domain. When he initially registered the domain he had unwittingly stopped the malware in its tracks. In his MalwareBytes blog post he also gives us a good insight into how malware analysis is carried out;

  1. Look for unregistered or expired C2 domains belonging to active botnets and point it to our sinkhole (a sinkhole is a server designed to capture malicious traffic and prevent control of infected computers by the criminals who infected them).
  2. Gather data on the geographical distribution and scale of the infections, including IP addresses, which can be used to notify victims that they’re infected and assist law enforcement.
  3. Reverse engineer the malware and see if there are any vulnerabilities in the code which would allow us to take-over the malware/botnet and prevent the spread or malicious use, via the domain we registered. [2]

There are 2 types of malware analysis we are going to discuss next – Static and Dynamic Analysis

What is Static Analysis?

With static analysis we look at analysing the malware itself in the form of code reviews. By going through the code or malware structure we try to identify functions within them. This can be a very challenging task, especially once you see the methods the malicious coders deploy to prevent analysis of their code.

This kind of analysis takes place when the malware is “at rest”, that is it is not being run, and it can be useful as a first step preliminary analysis. There are two types of Static analysis. Basic analysis makes use of simple hash comparisons (commonly seen in older antivirus) and the extraction of strings, headers, functions and system API calls to try to build a picture of what the malware is. Advanced analysis is much cooler, hardcore disassembling the executable into assembly language and then review that, jumps and all! We have several tools to help us with this, IDA Pro being what comes to mind.

What is Dynamic Analysis?

Dynamic analysis, as the name suggests, deals with malware “in process” (i.e. malware actively running on the victims machine). When we are carrying out this kind of malware analysis we knowingly execute it in order to observe (and document!) its impact on the victim. This gives us a much better understanding and detailed view of what the malware is doing, and can be especially useful if the executable is packed or encrypted in part or whole. This is because, when you execute a program and it is in memory/in use it is decrypted and unpacked. There are 2 types of dynamic analysis, like with static; Basic and Advanced. With Basic Dynamic Analysis the method of analysis is to run the malware and gather information on what it is doing. We can identify, for example, what changes are being made on the file system, to configuration files, what registry keys are being created and edited, and what network activity is taking place and more. For Advanced Dynamic Analysis we thoroughly debug the malware binary and step through its execution. This involved us trying to identify each instruction and the outcome of that instruction, giving us a better understanding of all activities.

Show we do our analysis on a virtual or physical environment?

When we decide to get practical we must figure out if we are going to use a VM running on our laptop to act as a virtual environment or to get an old derelict computer for a physical environment. In general physical environments are preferred as modern malware may attempt to detect if it is running on a VM (and if so will refuse to perform any actions). If we are using physical machines we need to take adequate care with making sure it is sufficiently segregated from our internal network and the public internet. There are a few ways to do this, mostly by restricting traffic on your firewall, or even just not connecting the device to the network at all (the infamous Air-gap method). There are also some tools to help us with snapshotting and roll backs so we don’t have to reinstall after every execution using Ghost Imaging software or using technology like Deep Freeze, which resets core configurations on every reboot to a known good state.

Having a virtual environment simply means using VMware or VirtualBox on your standard workstation to run VMs of your “victim” to execute the malware on. This can be great for speedy rollback as there is usually snapshotting in your virtual environment but if the virtual nature of this environment is detected by the malware it may not run. There may be ways to mask this but I will need to research this in a later post.

What is automated analysis?

After reading about static and dynamic analysis, large portions look like they might be repetitive and tedious. Both of these traits indicate a good candidate for automation and Malware analysis is no exception. This will free up the ever more precious human analyst for more important work and reduce the ever present risk of error to some degree. There are many automated analysis tools available. Some such as Comodo Valkyrie and Threat Expert are cloud based tools where we upload our malware samples, while others such as ZeroWine and Buster are locally installed. These analysers are sandboxed so can run malware automatically with lesser risk (because risk is never 0!) of compromising the host system. They are great for reducing the noise our analysts sift through and highlighting the most important findings for further review but come with several drawbacks;

  • If the target malware is VM-aware then they may identify the analyser as a virtual environment and not execute as normal;
  • If the malware has a particular trigger it requires to run it may not receive this with an automated tool such as
    • Requiring human interaction;
    • Executing at a particular time;
    • Executing after a predefined action has taken place or similar – such as fileless cryptojackers waiting until the device has been idle for a period of time before starting to mine.
  • They might miss certain logical indicators that a human would not.

Despite these flaws, they are a great first step.

How does malware stop us from analysing it?

Malware has a few techniques it uses to try and prevent us from analysing it. A few ways have already been mentioned in the automated analysis section. If the malware detects it is running on a virtual machine it may not run normally or at all. It might rely on triggers to try to avoid or delay detection of activities until conditions are right or it may even remain dormant for a long period of time, frustrating the analyst into believing it is benign. One method of combating analysts that malware employs that is very effective is the use of obfuscation techniques.

Obfuscation is making use of a few techniques to make reading the code to identify its purpose challenging. With encryption the malware is composed of two parts, one part is the encrypted main body of code, and the other is the decryptor used to recover that code. In general a different encryption key is used for each iteration resulting in different encrypted outputs and hashes, confusing antivirus’ but the decryptor itself tends to remain on changed, providing a way to detect these infections.

The malware may also use encoding such as XOR or Base64 to transform its code and make it less readable by humans – this is especially true for malware that uses custom encoding. This means that even once an analyst unpacks to source code there is an extra layer of defense they must navigate. Defense in depth it seems is used by both sides in this computational war.

Packing is another way malware can try to obfuscate itself. Packing is used by many applications – malicious and legitimate. By packing the code we are compressing it for distribution. This can help malware avoid detection and analysis though most packing tools are common and detectable the analyst must still unpack the malware prior to analysing it.

Had your fill of obfuscation yet? Or are you still finding it incredibly interesting? Code obfuscation is a common source of frustration felt by analyst’s worldwide. The malware authors play with their code to make it as confusing as possible. They do this by re-ordering their code so it does not flow in a logical fashion, they might insert code that can lead analysts down a dead end (called Dead Code)  by having code with functions and calls that do not do anything. They might substitute common instructions for less known equivalents. In some cases they may change the assembly language jump instructions to further cause confusion.

Summary

For a first class, this was jam-packed with exciting information, especially around Code obfuscation, which if you thought was skimmed over, don’t worry I’ll be writing a dedicated post on this in the future! There should be one new blog post per week on malware analysis bookmark us so you never miss a chapter! If you cant wait that long to go further on your malware hunting journey I recommend Malware Unicorns Reverse Engineering 101 for a cool course on both static and dynamic analysis; https://securedorg.github.io/RE101/

It is complete with some amazing graphics that put my text heavy blog to shame. 🙂

Until next time.


[1] https://twitter.com/MalwareTechBlog

[2] https://www.malwaretech.com/2017/05/how-to-accidentally-stop-a-global-cyber-attacks.html