Basic Static Analysis

PreviousLAB Network Setup NextBasic Dynamic Analysis

Last updated 1 month ago

Basic Static Analysis

Introduction

Basic static analysis means malware analysis without use of triage i.e passive analysis without execution.

Generating Hashes

MD5 and SHA256 hashes are generated for the malware samples in order to confirm from virus databases if the sample has been seen in the wild before.

Don't arm the sample before generating hashes .

SHA256- Command used -
sha256sum.exe <Malware file>
MD5- Command Used -
md5sum.exe <Malware File>

After generation of hashes, submit them and check on - https://www.virustotal.com/gui/home/upload

Static String Analysis

Strings are required by a malware executable in order to reach to an url or an online resource. As computer can't understand string characters, the malware creator has to hard-code the string into the malware.

FLOSS is such a tool which can be used to decode the strings in a malware.

Syntax-

	floss.exe <malware_name.exe>

Also -n x parameter can be used where x is the desired string length.

	floss.exe -n 6 [malware_name.exe]

The strings may be put knowingly in order to trick us . Thus we should be careful of them.

Analyzing Import Address Table

Import Address Table

Import Address Table The structure and content of the import address table are identical to those of the import lookup table, until the file is bound. During binding, the entries in the import address table are overwritten with the 32-bit (for PE32) or 64-bit (for PE32+) addresses of the symbols that are being imported. These addresses are the actual memory addresses of the symbols, although technically they are still called "virtual addresses." The loader typically processes the binding.

Import Table

The Import Table is actually called "Import Directory Table" and contains entries for every DLL which is loaded by the executable. Each entry contains, among other, Import Lookup Table (ILT) and Import Address Table (IAT)

Import Directory Table The import information begins with the import directory table, which describes the remainder of the import information. The import directory table contains address information that is used to resolve fixup references to the entry points within a DLL image. The import directory table consists of an array of import directory entries, one entry for each DLL to which the image refers. The last directory entry is empty (filled with null values), which indicates the end of the directory table. Each import directory entry has the following format:
Offset    Size    Field
0         4       Import Lookup Table RVA
4         4       Time/Date Stamp
8         4       Forwarder Chain
12        4       Name RVA
16        4       Import Address Table RVA

References:

To analyze the import address table, PEview.exe is used. An executable is nothing but a gigantic array of bytes in hexadecimal format. PEview can be used to view the raw data. Below is the lookup of a sample malware file.

Dots (.) are bytes that can't be represented.Pfile is the offset of the starting byte.

Data in every file follows the same format. The first byte tells the operating system what kind of file it is.

Here the MZ tells the file system that it is a windows portable exe file.

Above is the DOS header of the current executable.

Highlighted is the time of compilation of the malware. It doesn't always give the correct value.

If virtual size= size of raw data , every data in the binary is available to us at startup i.e it is an unpacked binary. Otherwise , it is a packed binary.

Introduction to Windows API

API- Application Programming Interface

Windows API is a set of functions that enable programmers to interface with the windows operating system at a very low level.

In-order to use them, they are needed to be imported from the os using the Import address table.

For example-

ShellExecuteW is an API. It's documentation is provided below-

URLDownloadToFileW API is usually used by malwares to download a second stage executable.

The list of APIs usually used by malicious files is listed at - https://malapi.io/

Packed Malware Analysis

What is packing?

Distributing an executable in a compressed or obfuscated state, making it more difficult to detect statically analyse and reverse engineer.

In the context of malware, since the primary malicious payload is compressed or obfuscated in a packed sample, security products that perform automated static analysis may have issues flagging the binary as malicious, which is obviously a major advantage for malware developers.

We can loosely categorized packers:

Compressing packers’ primary purpose is to distribute the executable in a compressed format, primarily to reduce the size of the file being distributed.
Encrypting packers’ primary purpose is to encrypt or obfuscate the distributed executable to prevent end users from reverse engineering the application. Encrypting packers are sometimes called as crypter.

some packers are going to do some combination of both. In the case of UPX, compression is the exclusive purpose, and no encryption is performed. It’s important to note that packing is not something exclusively found in the world of malware, there is an uncountable number of vendors pack their products to either reduce the size of their product or protect their product from being reverse engineered and re-distributed.

How does packing work?

stub-payload packing architecture, which is one of the most common mechanisms used by packers, including UPX.

In a “stub-payload” architecture, new executable is created that contains two primary components: the compressed/encrypted contents of the original executable, and a short piece of code responsible for decompressing/decrypting that original executable to executing it. This short piece of code is often referred to as a stub. In essence, the original executable is compressed/encrypted, then wrapped in a new executable which contains code to bring it back to its original state.

The stub will be the entry point of the new executable and once it performs the necessary decompression or decryption processes, it will pass control flow to the original executable which would then be in its original state. At this point the original executable carries on its execution as if it were never packed to begin.

Indications of Packing:

Immediately look for hint towards a packed sample:

Lack of Imports in Import Address Table (IAT)

In order for an executable to interact in any meaningful way with the underlying operating system, it is required that the executable imports functions built into system libraries such as kernel32.dll and user32.dll. When looking at a fully unpacked sample, you’re often going to see a large number of imports, since malware is obviously going to want to interact quite heavily with the operating system. However, since the stub of a packed sample doesn’t have much functionality outside of unpacking and executing the real payload, packed samples often have a suspiciously low number of imports comparative to a standard executable.

Non-standard Section Names

In a traditional executable, you’re often going to have the same sections every time (text, Data, rsrc, etc). However, many packers define their own custom sections, which indicates that the executable is non-standard and may be packed. For example, the UPX packer ships its final executable with the non-standard section names of UPX0 and UPX1.

Sections with a small raw size but a large virtual size:

When you see a section with a small raw size (sometimes 0), that indicates that the actual executable does not contain any raw data in that specific section. However, when the executable is loaded into memory, the raw size is no longer relevant, and instead the virtual size of each specific section is allocated in memory. If a section is being allocated a large amount of virtual space, yet contains no actual raw data, that indicates a potential cave in which unpacked code may eventually be written to, which is commonly done by unpacking algorithms.

Sections with very high entropy

The word entropy refers to the variance and “randomness” of a piece of data. Things like the English language, assembly code, and other well-defined structures of communication usually have low entropy since language tends to follow predictable patterns. However, things like encrypted data and compressed data have no such sense of predictability, and hence have much higher entropy. If a section of data has high entropy, it’s likely the section contains either compressed or encrypted data that will eventually be unpacked.

Low number of discernible strings

In a fully unpacked executable, you should be able to notice a decent number of readable strings, since most applications (including malware) use protocols that implement human language (for example, HTTP uses POST, HEAD, GET, etc). These sorts of strings, which should exist in a standard executable will not exist in packed executables, since the strings will be encrypted or compressed. If we analyse the strings of a binary and cannot interpret any readable strings, you could be dealing with a packed sample.

Sections with RWX privileges

In standard executables, it is uncommon for a section to be marked as both writable and executable, want to write over the executable code contained in application. Additionally, you rarely dynamic write additional executable code in a standard application. Because of this, there’s never really any reason for a section to be both writable and executable, except in the case of a packer, in which data will be unpacked into a section (write) then passed execution (execute).

jmp or call Instructions to registers/strange memory addresses

For many packers, the address to the location of where data is being unpacked to is stored in a register (such as eax), and that memory address is often in an entirely different section. Very long jumps like this are relatively uncommon, since all the executable code in a binary is usually contained in a single section.

If we see a jmp/call to a memory address that:

isn’t in the current section
isn’t in the address space of a loaded library, it’s likely that jump is to unpacked code.

Combining Analysis Methods

PEstudio is used to automate all beforeseen methods in one place. Drag and drop a file into PEstudio to start analysing.

It also includes inbuilt malware indicators with their threat level.

Identifying Malware Capabilities & Intro to MITRE ATT&CK

Our goal during basic static analysis is to triage correctly and as quickly as possible. Now that we've learned a bit about how to perform basic static analysis and how to correlate static indicators, let's deploy another tool that can assist in this phase and hopefully speed things up.

Capa is a program that detects malicious capabilities in suspicious programs by using a set of rules. These rules are meant to be as high-level and human readable as possible. For example, Capa will examine a binary, identify an API call or string of interest, and match this piece of information against a rule that is called "receive data" or "connect to a URL". It translates the technical information in a binary into a simple, human-readable piece of information.

Let's learn more about this tool by using it on the binary we've already performed static analysis on, Malware.Unknown.exe.malz.

On FLAREVM, run capa -h to see the usage menu:

C:\Users\husky\Desktop λ capa -h usage: capa.exe [-h] [--version] [-v] [-vv] [-d] [-q] [--color {auto,always,never}] [-f {auto,pe,sc32,sc64,freeze}] [-b {vivisect,smda}] [-r RULES] [-t TAG] [-j] sample

The FLARE team's open-source tool to identify capabilities in executable files.

positional arguments: sample path to sample to analyze

optional arguments: -h, --help show this help message and exit --version show program's version number and exit -v, --verbose enable verbose result document (no effect with --json) -vv, --vverbose enable very verbose result document (no effect with --json) -d, --debug enable debugging output on STDERR -q, --quiet disable all output but errors --color {auto,always,never} enable ANSI color codes in results, default: only during interactive session -f {auto,pe,sc32,sc64,freeze}, --format {auto,pe,sc32,sc64,freeze} select sample format, auto: (default) detect file type automatically, pe: Windows PE file, sc32: 32-bit shellcode, sc64: 64-bit shellcode, freeze: features previously frozen by capa -b {vivisect,smda}, --backend {vivisect,smda} select the backend to use -r RULES, --rules RULES path to rule file or directory, use embedded rules by default -t TAG, --tag TAG filter on rule meta field values -j, --json emit JSON instead of text

To provide your own rule set, use the -r flag: capa --rules /path/to/rules suspicious.exe capa -r /path/to/rules suspicious.exe

examples: identify capabilities in a binary capa suspicious.exe

identify capabilities in 32-bit shellcode, see -f for all supported formats capa -f sc32 shellcode.bin

report match locations capa -v suspicious.exe

report all feature match details capa -vv suspicious.exe

filter rules by meta fields, e.g. rule name or namespace capa -t "create TCP socket" suspicious.exe

Capa has lots of command line options, but let's run it against Malware.Unknown.exe.malz with no arguments to see what the program looks like by default.

Run capa [C:\path\to\Malware.Unknown.exe.malz] to execute the program. Because I am in the Desktop directory, the command is capa Malware.Unknown.exe.malz in my example below:

Let's examine the results. Immediately, we see some boiler-plate information about the binary, like its hashes. But then, we get some interesting high-level information about the program.

The first block in the output labeled "ATT&CK Tactic - ATT&CK Technique" is worth examining in depth.

What is ATT&CK?

MITRE Adversary Tactics, Techniques & Common Knowledge (ATT&CK)

The MITRE ATT&CK Framework is a standard knowledge base of adversary tactics, techniques, and procedures (TTPs). MITRE ATT&CK seeks to define and classify cyber adversary activity into groups based on what the activity seeks to accomplish and how the activity is carried out.

In my professional life, no other standard set of definitions has seen more use than MITRE ATT&CK. It is an industry standard just about everywhere you go.

For example, let's say you want high-level information about the types of tactics that adversaries use to gain initial access to a target network. The MITRE ATT&CK Framework has a grouped list of items classified under TA0001 - Initial Access, that you can view in list form:

Then, if you want more information about a specific initial access technique, like phishing, you can view the technique page for T1566 - Phishing:

And then, if you want an example of a more specific sub-technique of phishing, like spearphishing with an attachment, you can view the subtechnique T1566.001 - Spearphisning Attachment:

The pages in the ATT&CK matrix have information about the specific tactic/technique, tools that can deploy this technique, mitigations, and detections. For example, T1566.001 - Spearphishing Attachments lists the known adversary groups that use this technique (which, for Spearphishing, is probably most adversaries!):

I highly recommend perusing the MITRE ATT&CK matrix items. I can get lost in that website for hours learning about new tactics, techniques, and procedures. I also highly recommend becoming fluent in the ATT&CK framework for report writing as it can be an exceptionally useful way to frame findings and information in industry common terms.

Capa Output

Now, back to Capa! Capa has examined the binary, pulled out interesting information from the binary, matched it against its default rule set, and matched some suspected capabilities to items from the MITRE ATT&CK Framework. This time, we don't have much to go on. We get a match for the ATT&CK item "T1129 - Shared Modules".

If we examine the matrix item for Shared Modules, we don't get a lot of useful information:

This basically means that the malware is loading DLLs to perform malicious activity. That's not particularly revealing! Let's keep moving.

Instructor's Note: it seems that the Shared Modules technique wasn't too useful at all and the Capa developers have removed the default rule for it! If you run Capa against this sample, there's a chance you will not see this as a listed technique. It's not particularly useful for our analysis, so please feel free to move on to the next section.

If you want to see an example of how Capa can identify techniques, also feel free to run Capa against the WannaCry sample that we detonated earlier in the course.

Malware Behavioral Catalog (MBC)

The next output is the Malware Behavioral Catalog (MBC) Objectives and Behaviors. This is a similar classification system to MITRE ATT&CK but focuses on malware specifically.

MBC translates MITRE ATT&CK items into terms that focus on the malware analysis use case. So understandably, we do get some useful output from this section:

Here, Capa has identified items of interest in the binary, matched them to rules based on MBC items, and returned the results. We've accurately identified that the Malware.Unknown.exe.malz sample has the capability to

Send and receive data
Do so over HTTP
Create and terminate processes

For a preliminary round of triage, that's pretty good! But let's keep going; the best is yet to come.

Capa Rule Output

The final block identifies Capa rule matches against the default Capa rule set. This is the most specific of the three outputs and gives us the best information for triage:

Like in the MBC output, the Capa rule output identifies that the malware can connect to a URL, send and receive data, and manipulate processes. At surface, there isn't much more information here than what we already have. But we do see the number of matches and the namespace for the rules in this output.

Is there more going on under the hood of Capa? Yes, yes there is!

Let's rerun Capa with the verbose flag. Run capa [C:\path\to\Malware.Unknown.exe.malz] -v and examine the output:

Finally, let's run Capa one more time with a double verbose output. Run capa [C:\path\to\Malware.Unknown.exe.malz] -vv and examine the output:

There is tons of incredible information here and we can clearly see how Capa is now triggering the rules for this binary. For example:

download URL to file namespace communication/http/client author matthew.williams@fireeye.com scope function mbc Communication::HTTP Communication::Download URL [C0002.006] examples F5C93AC768C8206E87544DDD76B3277C:0x100020F0, Practical Malware Analysis Lab 20-01.exe_:0x401040 function @ 0x401080 or: api: urlmon.URLDownloadToFile @ 0x4010D9

The output for the "download URL to file" rule indicates that this rule triggers when the urlmon.URLDownloadToFile API call is located in the binary. It has identified this API call, provides the location in the binary where it is called, and provides some examples of where this kind of malware behavior has been seen before.

Notice that for some rules, there are conditionals that can trigger the rule based on multiple criteria. For example:

create process (2 matches) namespace host-interaction/process/create author moritz.raabe@fireeye.com scope basic block mbc Process::Create Process [C0017] examples 9324D1A8AE37A36AE560C37448C9705A:0x406DB0, Practical Malware Analysis Lab 01-04.exe_:0x4011FC basic block @ 0x4010E3 or: api: shell32.ShellExecute @ 0x401128 basic block @ 0x401142 or: api: kernel32.CreateProcess @ 0x4011AD

This rule identifies process creation based on the existence of the ShellExecute API call located in shell32.dll or the CreateProcess API call located in kernel32.dll.

Summary

Now that we understand the specifics of basic static analysis, we can turn to a tool like Capa to do a lot of the heavy lifting for us during triage. Capa can give us high-level information about what may be going on in the sample of interest. It's usually never enough information to draw a definitive conclusion, but it's a start! More analysis is necessary to uncover the ground truth for any given sample.

PreviousLAB Network Setup NextBasic Dynamic Analysis

Last updated 1 month ago