During the course of our careers we will come across artifacts left behind by malicious hackers the are easily decipherable. These may be scripts, text files or something else that’s easy to open and view. But more often we, as malware analysts come across files that are not easily identifiable as malicious. Take executable files for example. These may be malicious or benign so we tend to take extra precautions with them. To figure out if an executable is safe or not it tends to help if we understand the structure b of executable files – and in turn gain an understanding of how these files affect our systems.
How do we do this? With static analysis. We are going to focus on PE (Portable Executable) file structure for this class. PE files are executables usually found on windows. The reason we take the time to perform this analysis is to identify them as benign – and thus avoid the costly steps of an investigation; or to identify them as malicious to be marked for evidence. If it is malware we can then classify it if it is a new malware or variant strain of an existing malware. This gains us some uber street cred.
How do we get files to analyse?
If we think there is malware running on our system we can get a dump of our ram and then parse it with tools like Volatility which is a neat open source tool. This would give us some cool information on the processes running and files open, allowing is to gain an understanding of what is going on. It is particularly useful for attacks where the hard disk may not be accessible, such as during a ransomware attack. This information allows us to locate the suspcious file and from there we can upload it the VirusTotal or a similar site to see if we get any hits. If virus total gives us ambiguous results we get to put our sherlock holmes hat on and investigate further.
The first step in malware analysis after we have our suspect file is to carry out a static analysis. We discussed this in our last talk and what it allows us to do is gather information without actively running the malware(actively running = dynamic analysis). So lets take this step by step.
Opening the malware to read the clear text
We can do this in several ways, using notepad, a hex editor or an application like Strings. What it allows us to do is to read any cleartext in the executable. There will be alot of unrenderable garbage but buried beneath that we may begin to form an understanding of what the code is doing. We might be able to identify error messages, help pages, function calls or similar.
1. Take a hash of the file.
We first hash the file so we can easily identify it and check for changes later. There are lots of hashing algorithms we can use and it doesn’t really matter which one, MD5 or SHA are the most common. With the has we can check it with antivirus scanners quickly for matches, or use something like the NIST databases. This helps us identify if the malware has been investigated before and what the findings were. Some places we can compare are;
- NIST National Software Reference Library
- Team Cymru Hash Registry
Apart from helping us identify known malware these can also help us identify if the file is a legitimate software or operating system file. By doing this we reduce the number of files that require a review.
In addition to the traditional methods of hash comparison above there have been several new types of hashing that may be more useful. Similarity hashing, fuzzy hashing, piecewise(or block) hashing and similarity digests are all tools we can use and there are a number of algorithms that support them, such as SDHASH, SSDEEP and MINHash.
Similarity Hashes compare files for similarity(shocking i know). They calculate the hash of portions of the code, such as functions or blocks, to identify similarities such as the same functions. Because its calculated based on a block of the binary if one piece of the code changes the parts of the code that are still similar will compare with the original. This gives us a comparison score. THis helps us identify modified files, increases the difficulty of malicious actors obfuscating their code and it speeds up our analysis time by removing a lot of manual analysis. The NSRL has hashes and similarity hashes.
2. Check the suspicious file with VirusTotal
Before we start with a full blown analysis a good, quick check we can carry out is uploading the file to one of the many virus scanning sites like Virus Total, Meta-Defender, Virscan, Jottie and others. Generally you can upload the file itself or just check the hash. This can be great to identify if the file is a known piece of malware but as you will see if you try this yourself it can be hit or miss, especially if the file is a malware variant that hasn’t been identified and classified before. We discussed in lesson one about how malware variants are being released daily and even reputable antivirus companies are not picking them all up. If we are lucky we can collect some information on the file, what it does and what malware family it can belong to – which will aide us later.
3. Investigating the inside of the file!
Strings.exe is a pretty cool too and can be downloaded from here. It goes through the file and extracts the contents and tries to print out any ASCII or Unicode values it identifies into a file you specify(or in the cmd prompt if you didnt specify a file). For malware that is not heavily obfuscated or encrypted this can be a useful tool for identifying the nature or purpose of the file.
The information we get can tell us about network activity, file activity and registry activity – along with providing us the function call names the file uses. This can help us narrow down if the file is trying to copy our clipboard, set up a server, gain persistence and more. This makes it a great tool to use to start by looking for odd or unique strings.
If the file we are looking at is a PE(from lesson two) there are two good tools we are going to look at in detail; PEDump and PEView. PEDump can be used to extract detailed information from the header of a PE file; this information can be shown on the cmd prompt or stored to a file. PEView is a GUI(yuck) tool that allows us to view what makes up the PE’s headers and sections.
PE File Analysis
The are aseveral components to a PE file that we should be familiar with before going further;
MS-DOS Header is the ever static “This program cannot be run in DOS mode” that is kept around for legacy reasons to prevent you running the program in incompatible DOS operating systems. If this is omitted the OS would fail to attempt to load the file on legacy machines. This header occupies the first 64 bytes of the PE file.
For executable files on the windows systems the MS-dos header is also called IMAGE_DOS_HEADER.
The file signature(or Magic Number) consists of the letters MZ, or 0x5a4d in little-endian hexadecimal format (Little endian machine: Stores data little-end first. When looking at multiple bytes, the first byte is smallest.), and is always the first two bytes of a file. Wanna know what MZ stands for? Its the initials of the guy that designed it!
e_lfanew is the last 4-byte value that points to the location of the PE header. This is important as after the header is the stub program that is run by ms-dos when the executable is loaded. This checks the OS compatibility with the file.
Next comes the PE File Signature which should be identifiable by a value of 0xe8 or PE.
The PE File Header/IMAGE_FILE_HEADER is contained in the 20 bytes following the PE signature and includes several things we will find useful;
The can be very important for forensics investigating as it represents the time that the image was created by the linker. The value is represented as the number of seconds since the start of January 1, 1970 in Universal Coordinated Time. This gives us the time on the coders computer from when they compiled the executable, and may be a clue as to when the program was created. But as with all computer evidence; this could have been modified at some point.
Machine show the architecture the program is designed to run on, such as x86 or x64.
The Characteristics flag gives us some more clues as to what the files does. For example characteristics can include IMAGE_FILE_EXECUTABLE_IMAGE(i.e. that this is an executable), IMAGE_FILE_DLL (i.e. that it is a DLL) and IMAGE_FILE_SYSTEM(for system files). But there are many others.
IMAGE_OPTIONAL_HEADER we get the chance to get additional information from the file such as the magic number – information about the structure of the file to enable the OS to execute the file. Including;
- SizeOfCode: The total size of all code in the file.
- AddressOfEntryPoint: The address where the loader begins execution of the file.
- MinorOperatingSystemVersion: Minimum OS Version needed to run.
- Checksum: The hash.
After this we have the actual sections of the file itself, which is what we are interested in. The most common and interesting sections of the PE file are;
- .text: This section contains the instructions that the CPU executes. All other sections just store the data and supporting information. This should be the only section that can execute and the only section containing code.
- .rdata: This section contains the import and export information we can see in PEView. Sometimes this is split into .edata (export) and .idata (import).
- .data: The .data section contains global variables and data which is to be accessed from anywhere in the program. Local data is NOT stored here.
- .rsrc: Contains resources used like icons, images, menus and strings.
Section headers are called IMAGE_SECTION_HEADER and give us information about the section structures including size and characteristics of each section. Within these sections are IMAGE_DATA_DIRECTORY structures that act as a directory for import/export tables, resource directories etc.
These sections hold information on the code and data for the applications. Libraries, DLL’s, API’s and any systems calls made are stored here.
Programmers can link imports so they dont need to re-impliment certain functionality in their code and between multiple programs. As analysts we can gather a lot of information on what a program does based on the functions it imports. There are 3 types of linking;
Static: Used in linux/unix, all code is in the executable. Makes the executable big. It’s difficult to differentiate between statically linked code and the executable’s own code
Runtime: Executables connect to libraries only when that function is needed. The linked functions do not have to be declared in the executable file header. This means the program can access any function in any library on the system and we can only know if we execute the malware for dynamic analysis.
Dynamic: This is the most common method of linking. With dynamic linking the host OS searches for the linked libraries when the program is loaded. When the program calls the linked library function, that function executes within the library. The PE file header stores information about every library that will be loaded and every function that will be used by the program. We can make calculated assumptions about what the program does based on these functions.
Knowing how PE files are structured and built can aide us in identifying the purpose of the application. There are a variety of tools used to help with this, like PEDump (which lets us locate the import data directory and parse the structures to determine the DLLs and the functions the application uses.), but it is ultimately a very manual process. There are countless functions and system calls that a file can import. MSDN have a great reference facility to find out what each function does.