🐳
Swayam's Blog
LinkedinGithub
  • 🫚root@Swayam's Blog
  • 🕺About Me
  • 🛠️Projects
    • CurveLock
    • ShadowChain
  • 🐞Malware Analysis
    • Basic Malware Analysis
      • LAB Network Setup
      • Basic Static Analysis
      • Basic Dynamic Analysis
      • Advanced Dynamic Analysis
      • Advanced Static Analysis
      • Identifying Anti analysis techniques
      • Binary Patching
      • Shellcode Analysis
      • Malware.unknown.exe.Malz
      • Challenge-Sillyputty
      • Bind_shell RAT Analysis
      • Malicious Powershell Script
      • Malicious HTA(HTML Applications)
      • Phishing Excel Embedded Malware
      • Reversing Csharp And DotNET Framework
      • YARA rules
      • Automating Malware Analysis
    • MASM 64 Bit Assembly
      • Hello World Of Assembly Language
      • Computer Data Representation and Operations
      • Memory Access And Organization
      • Constants, Variables And Data Types
      • Procedures
  • 👨‍💻Malware/Exploit Development
    • Driver Development
      • Driver 101
      • Kernel Calbacks
      • Process Protection
      • Process Token Privilege
  • 📖Notes And Cheatsheets
    • OSCP / Application Security
      • OS stuff
        • Footprinting
        • Nmap
        • Shells
        • Metasploit
        • Windows Buffer Overflow
        • Windows
        • Windows Privilege Escalation
        • Linux Commands
        • Linux Privilege Escalation
        • Password Cracking
        • Pivoting And Tunneling
        • Macos
      • General Introduction
        • Basic Tools
        • Basic Networking
      • WebApps
        • Attacking Common Applications
        • Attacking Common Services
        • Broken Authentication
        • Burp Proxy
        • Common Apps
        • Command Injection
        • ffuf Fuzzing
        • File Inclusion
        • File Transfer
        • File Upload
        • Javascript Deobfuscation
        • Password Attacks
        • SQLi
        • Web attacks
        • Web Information Gathering
        • Wordpress
        • Brute Forcing
        • HTTP Curl
      • Active Directory
    • Wireless Attacks
    • Red Teaming
    • BloodHound
    • Pentesting
    • ADCS
  • 🚩CTFs
    • Google CTF
Powered by GitBook
On this page
  • Introduction
  • Assembly Basics
  • Decompiling And Disassembling Malware
  • More on x86 cpu architecture
  • Assembly And Windows API
  1. Malware Analysis
  2. Basic Malware Analysis

Advanced Static Analysis

PreviousAdvanced Dynamic AnalysisNextIdentifying Anti analysis techniques

Last updated 1 month ago

Introduction


Advanced static malware analysis stands for tearing down of a malware into it's most basic (Assembly) form using disassemblers and decompilers. We will reverse engineer the binaries and recreate it's code as close as possible to it's source code.

Assembly Basics


Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages, which are generally portable across multiple systems. Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM, MASM, etc.

For more details, refer to assembly books in Book-collection section.

Decompiling And Disassembling Malware


Tools used - cutter

Open cutter and select the executable-

Now click on open. A new window pops up .

No need to do anything now.Just click on ok. After some time we are greeted with a screen showing some familiar stuff about the malware such as format,hashes,libraries,etc.

We have some options on the bottom for previous static analysis methods such as -

Clicking on Disassembly, we find a list of functions called upon by the malware on the left and the assembly code on the right.

After scrolling down on the left sidebar, we stumble upon the main function of the program.

We also see the graph tab on the bottom. It gives out a graph form of the malware program flow.

As we look closely on the graph we can see the program flow of the malware -

It creates a file and reaches out for http:__ssl_6582datamanager.helpdeskbros.local_favicon.ico. Now as per the reception of the file, it tells the eax register what to do next.

If file is returned, then if executes the below which is the assimilation of the favicon.ico and the created exe file which in turn starts a bind shell on port 5555.

If web is unreachable, it deletes itself and the created exe file as shown below.

The Decompiler option takes all the assembly information and tries to recreate the source as close to the source code as possible.

More on x86 cpu architecture


For a binary to execute in x86 architecture, there are three things that should be taken into account -

  1. CPU instructions

  2. Memory registers

  3. Stack

x86 instruction is written in little endian format i.e the instruction comes before the destination and the source. For example, in -

	MOV R1,R0

MOV will be executed first . Then data from R0 is transferred to R1.

For jumping and logical branching , JMP instruction is used.

Stack is a special place in memory which stores data in a sequential order i.e the order in which they are to be executed. Stack grows downward i.e new data is added to the stack at lower addresses.

PUSH is used to add data to stack.POP is used to remove lowest data from stack.

Call instruction is used to call subroutines from the main method.

Ret instruction is used to return data from the subroutine to the main function.

Registers-

EAX is the accumulator register.EDX is the data register.EBX is the base register.ESP is the extended stack pointer.EBP is the extended base pointer.EIP is the extended instruction pointer.

Below is a brief overview of above concepts -

Memory Layout

Memory is simply an array of bytes, each byte having its own address. When a program is executed, the operating system allocates a chunk of memory to the program. That memory (called address space) is divided into different segments as shown below:

Memory layout of a process

  • The text section stores the program executable. When you compile a C program, the compiler converts your code to 0s and 1s, which represent instructions that the CPU will execute. Those 0s and 1s are going to be loaded into this text section when you run the program.

  • The data section stores initialized data (i.e. global variables that have been initialized).

  • The heap section are memory that you can dynamically reserve from calling malloc. The heap typically grows upwards, which means it grows toward larger memory addresses.

Keep this memory layout in mind. We will come back to it later when we start programming in assembly.

Registers

It turns out that modern computers require more than a physical memory to operate. Inside the CPU, there exists a small piece of memory called registers. Registers are extremely fast, because it can be directly accessed by the CPU. Modern x86–64 processors have 16 general-purpose 64-bit registers, whose names can be overwhelming to understand at first. So I’ll provide some historical context to help you understand them better. But first, the overall layout of the x86–64 registers:

Layout of x86–64 registers

Okay. Ignore all the descriptions on the right. In fact, ignore the entire image above. Don’t try to understand it right now. I put it purely for reference. We can come back to this later.

8-bit registers

  • The C register was the counter register, used to store counts just like the modern counter variables.

  • The D register stands for the data register. It was used to store the data of most I/O operations.

  • the sign bit (S) will be set to 1 if the result of the previous operation is negative, and 0 if the result is non-negative.

  • The zero bit (Z) will be set to 1 if the result of the previous operation is zero, and 0 if the result is non-zero, so on, so forth.

And then we have some special registers, like the stack pointer (SP), and the instruction pointer (IP) or the program counter, which we’ll get back to in a moment.

16-bit registers

For example, if I have 0100 1101 stored in AX,

  • AH would store 0100,

  • AL would store 1101,

  • AX would represent the entire 0100 1101.

Same goes for the BX, CX, and DX registers. At this point, I should introduce two new terminologies. You are going to hear these terms a lot more often now: a byte, which just means 8 bits; and a word, which means 16 bits.

The 8086 also contains a couple of new word registers and flag bits, among them:

  • the SI (Source Index) register, used as a pointer to a source in stream operations

  • the DI (Destination Index) register, used as a pointer to a destination in stream operations

  • the BP (Base Pointer) register, used as a pointer to the base of a stack frame (we’ll see examples of this)

  • the SP (Stack Pointer) register — okay, you have seen this one before, and it’s used as a pointer to the current position in the stack (we’ll also see examples of this soon)

Here is a picture summary of what we learned so far.

Don’t worry about the segment registers yet. We’ll probably never touch them in your class.

If you’re feeling pretty overwhelmed at this point, stop reading. Go take a break. Don’t look at the diagram above, but look at the diagram below instead, because things are about to get interesting.

32-bit registers

Modern computers nowadays work on at least 32-bit (long or dword, short for double word) registers. This time, same concept as before, EAX (stands for extended AX) refer to the entire 32-bit value. If you want to access the lower word value of EAX, you can still use AX. Sadly, you won’t be able to access the higher word value of EAX this time.

To give an example, suppose EAX stores 1100 0100 1110 0010,

  • AL would be 0010,

  • AH would be 1110,

  • AX would be 1110 0010,

  • EAX would be 1100 0100 1110 0010.

It is important that you are familiar with all the registers you have learned so far, so study the diagram above!

64-bit registers

Here’s the fun part: if I show you the image I told you to skip in the beginning, it kinda starts to make sense.

x86–64 registers

Yes, that’s how 64-bit (qword, short for quad word) registers look like. You’ll notice that you can now access the entire 64-bit value with RAX (that R just stands for register, I guess they ran out of names…). EAX, AX, AH and AL are still there to maintain backward compatibility. Same goes for all the other registers.

You also have additional general-purpose registers, R8 to R15, which can be used to store anything you like. In fact, this whole time, you can store anything you like in any register — like you don’t need to store counter values in RCX, because RCX is just a regular ol’ piece of memory!

Assembly And Windows API


Going back to cutter and dropper malware sample -

Two arguments - argc and argv are passed implicitly into the main function.argc takes in the source and destination .argv takes in the strings themselves which are passed along to the malware.

We see that some variables are declared at the start. As we dont know anything about them, they are ignored for now.

The base pointer is pushed onto the stack . This is important because if we don't have the pointer to the main program before executing subroutines, we can't return back to the main function. Thus leading to halt the program.

Here a call to the API InternetOpenW is being made. It requires a header which is -Mozilla_5.0. Also the zeros are arguments which are passed onto the API. It is consistent with the documentation-->

HINTERNET InternetOpenA(
  [in] LPCSTR lpszAgent,
  [in] DWORD  dwAccessType,
  [in] LPCSTR lpszProxy,
  [in] LPCSTR lpszProxyBypass,
  [in] DWORD  dwFlags
);

After that there is a call to a function -

Clicking on it, we get a whole graph of that function-->

Returning back to the main program, there is another call to an API - URLDownloadToFile.

HRESULT URLDownloadToFile(
             LPUNKNOWN            pCaller,
             LPCTSTR              szURL,
             LPCTSTR              szFileName,
  _Reserved_ DWORD                dwReserved,
             LPBINDSTATUSCALLBACK lpfnCB
);

The parameters passed onto it is also consistent with the official documentation. This concludes advanced static analysis.

Memory layout of a process

The bss section stores uninitialized data (i.e. global variables that have not been initialized). In case you’re curious, it stands for , named after an ancient assembler operator.

The stack section are memory that your and the inside your functions live. It is divided into stack frames, which grow downwards (i.e. grow toward lower memory addresses).

Layout of 64-bit registers

Long long time ago, in the age of the , registers are only 8 bits in size. They are simply named A, B, C, D, E, H, and L registers.

The A register, in particular, was used as a primary . It accumulates (as you’ll see when we get to the actual assembly part) the result of most operations.

The B register stands for the base register, which historically was used to store the of something we want to reference in memory (think arrays in C, where the reference address would be the address of the first element).

Intel 8080 (8-bit) registers

Picture from

In addition to general-purpose registers, we have , or flag bits, which are a series of bits that represent the status of certain operations. For example,

Fast forward a few years later, the came out. The architects of the 8086 added a slew of other registers, and made them 16-bit in size. Among all the changes, the original A, B, C, and D registers had a slight modification to their names. Since the registers are 16 bits now, we can divide the original registers up into two 8-bit registers like these:

16-bit registers

The AH and AL registers are called the “high byte” and the “low byte” respectively. Each of them are 8-bit (a byte) in size, but are to form the 16-bit AX register (X in this case just stands for pair).

Intel 8086 (16-bit) registers

Picture from

32-bit registers

Picture from

64-bit registers

This is used to setup a function call.

🐞
Block Started by Symbol
functions
local variables
Backstory
Intel 8080
accumulator
base address
https://en.wikipedia.org/wiki/Intel_8080
status register
Intel 8086
paired up
https://en.wikipedia.org/wiki/Intel_8086
https://www.cs.virginia.edu/~evans/cs216/guides/x86.html