windowsBuilding an Evasive Loader: Integrating Havoc, Donut, and Early Bird Injection

Hey! If you're diving into this, you're probably fascinated by the world of defence evasion and techniques that seem like something out of a film but work in the real world. In this post, I break down in detail how to build a loader that takes a Havoc implant, passes it through Donut for shellcode, encrypts it with AES-256 so that scanners don't detect it, and injects it into an innocent process using syscalls. All with a focus on being stealthy, bypassing EDR hooks, and keeping the C2 alive without drama.

It's not the eighth wonder of the world, but it's a solid combo for red teaming or educational labs. As usual: use it wisely, no illegal nonsense, OK? Let's get down to business, otherwise I'll get carried away chatting.

triangle-exclamation

Objective and Why it's Cool

The idea is to create an implant that connects to a C2 without being detected by AVs or EDRs. Havoc is a really cool open-source framework, like Cobalt Strike but free and with less crap (I chose Havoc for this documentation because of the sys calls, which it does really well, but my heart still belongs to Sliver). We set up an HTTPS listener so that the traffic is encrypted and doesn't raise suspicion.

We generate an .exe, convert it into shellcode with Donut (position-independent, for perfect injections), encrypt it for obfuscation, and put it into a C loader that uses Early Bird APC Queue: it queues an asynchronous call in the thread of a suspended process such as Notepad.exe. Direct syscalls from ntdll.dll to avoid hooks in user mode – EDRs like CrowdStrike go haywire with that.

In the end, you have a binary that goes unnoticed by Windows Defender and other AVs, injects the code into something legitimate, and maintains the connection. Like this: you run the loader, Notepad wakes up with the implant inside, and you control it from the C2. Pure evasive magic!!!

Configuración del Framework Havoc

We start with Havoc, which has a cool interface for customising everything. We set up an HTTPS listener, port 443. I didn't want to complicate things too much at this stage with the listener.

For the payload, agent ‘Demon’, x64, Windows Exe. Specific configuration:

  • Sleep: 45 seconds – interval for beacons, less noise.

  • Jitter: 35% – random variation in sleeps, so as not to be predictable.

  • Indirect Syscall: On – resolves syscalls indirectly, via hooks.

  • Sleep Jmp Gadget: None – no extras, just keep it simple.

  • Sleep Technique: Zilean – obscures sleeps with loops that look like normal code.

  • Sleep Jmp Gadget: None – no extras, keep it simple.

  • Proxy Loading: None (LdrLoadDll) – standard loading, no proxies.

  • Amsi/Etwp Patch: Hardware breakpoints – patches AMSI and ETW with hardware breakpoints, more stealthy than direct patches.

  • Injection:

    • Alloc: Native/Syscall – memory via syscalls.

    • Execute: Native/Syscall – syscall-based execution.

    • Spawn64: C:\Windows\System32\RuntimeBroker.exe – for x64 spawning.

    • Spawn32: C:\Windows\SysWOW64\RuntimeBroker.exe – for x86.

Havoc generates the .exe with these tweaks, ready for evasion. But raw, they would catch it, so next step.

Transformation to Shellcode with Donut

We take the .exe and run it through Donut, which converts it into PIC (position-independent code) shellcode. Donut inserts a stub that resolves APIs at runtime, handles relocs, and launches the entry point without looking at addresses.

We run it with flags for raw binary, without compression. The result is a standalone .bin file, perfect for injection, but still detectable – time to encrypt.

circle-info

I should add that Donut isn't the only one that can do this. I chose it because it was the one I had most readily available.

AES-256 encryption and header generation

Here we get into the technical details: we use a Python script to encrypt the shellcode with AES-256 in CBC (Cipher Block Chaining) mode, which is a symmetric block cipher with a 256-bit (32-byte) key, offering a high level of security against brute force attacks thanks to its key space of 2^256 possibilities. CBC mode introduces chaining: each plaintext block is XORed with the ciphertext of the previous block before encryption, providing diffusion and confusion, causing repetitive patterns in the input to be completely obfuscated in the output.

First, we generate a random 32-byte key using secure random bytes (cryptographically strong, not pseudo-random), and a 16-byte Initialisation Vector (IV), which acts as a nonce to initialise the chaining and prevent identical inputs from producing identical outputs, thus avoiding attacks such as known-plaintext or replay.

We read the .bin file as raw bytes (plaintext), apply PKCS7 padding to adjust the length to multiples of 16 bytes (AES block size): this involves adding bytes of a value equal to the number of padding bytes required (for example, if 5 bytes are missing, we add five 0x05), ensuring that the decrypt can remove it correctly without data corruption.

We create an instance of the AES cipher with the key, CBC mode, and IV, and encrypt the padded plaintext, resulting in a ciphertext of the same size or a multiple of 16. We show lengths for verification: original vs. encrypted (with padding).

Then, we convert the key, IV, and ciphertext to C array format: a string of bytes in hex as {0xAB, 0xCD, ...}, using a function that iterates byte by byte and formats with ‘0x%02x’. This is written to a .h header file with pragmas for single inclusion (#pragma once), defining unsigned char[] constants for key, IV, and payload, plus a size_t for the length of the encrypted payload.

Technical advantages: encryption makes shellcode appear as random noise, evading static AV signatures based on patterns or hashes; key/IV randomness makes each build unique; integrating into .h avoids runtime disk reads, reducing IOCs (Indicators of Compromise); PKCS7 ensures post-decrypt integrity.

Implementation of the Loader in C: Remote Injection via Syscalls

Now for the technical details: the C loader includes the .h header with the encrypted payload, and handles decryption and remote injection using direct syscalls to bypass user-mode hooks in EDRs, which typically intercept high-level APIs such as VirtualAllocEx.

First, we include Windows headers for types such as NTSTATUS, and define typedefs for the syscalls in ntdll.dll: NtAllocateVirtualMemory (allocates remote memory), NtWriteVirtualMemory (writes to remote memory), NtProtectVirtualMemory (changes protections), NtQueueApcThread (queues APC), NtResumeThread (resumes thread). These are resolved at runtime with GetModuleHandleA(‘ntdll.dll’) and GetProcAddress, avoiding static imports that give away the binary in PE analysis.

Decryption: we initialise an AES context with key and IV (using a library such as aes.h for CBC decryption), and decrypt the payload in-place or in a temporary buffer. CBC decryption involves XORing each decrypted block with the previous ciphertext, removing post-decryption padding to recover the original shellcode.

We create an objective process: we use CreateProcessA to launch Notepad.exe (fixed path C:\Windows\System32\notepad.exe, signed by MS for trust in Defender) with the CREATE_SUSPENDED flag, which pauses the main thread before executing user code. This gives handles to the process (pi.hProcess) and thread (pi.hThread), with PID for logging.

Step-by-step injection:

  1. Remote allocation: We call NtAllocateVirtualMemory with process handle, base address NULL (let the kernel choose), shellcode size (size_payload), flags MEM_COMMIT | MEM_RESERVE (reserve and commit pages), initial protections PAGE_READWRITE (RW for secure writing). We check NT_SUCCESS(status) to handle errors.

  2. Remote write: NtWriteVirtualMemory copies the decrypt shellcode from the local buffer to the allocated remote address, specifying bytes to write and checking bytes written.

  3. Protection change: NtProtectVirtualMemory sets the region to PAGE_EXECUTE_READ (RX: executable, readable, not writable), saving old_protect. This minimises the attack window (not RWX, which alerts heuristics).

  4. APC queuing: NtQueueApcThread queues an Asynchronous Procedure Call in the suspended thread, with ApcRoutine pointing to the remote address of the shellcode, and NULL arguments. APCs run in alertable state, perfect for Early Bird: it runs before the thread resumes normal code.

  5. Resumption: NtResumeThread wakes up the thread, executing APC immediately, injecting the implant into legitimate context.

Cleanup: we close handles with CloseHandle so as not to leave leaks or traces in the process tree. If any syscall fails (NTSTATUS < 0), we terminate the child process with TerminateProcess to avoid zombie states or partial detections.

Technically, syscalls evade hooks because they bypass kernel32/win32u wrappers, going directly to the kernel via ntdll; Early Bird takes advantage of the Windows process startup sequence, where pre-resume queued APCs are prioritised; using Notepad reduces suspicion due to reputation scoring in AVs.

Compilation and Evasion Strategies

Compile with MinGW: -O2 optimises, -s strips, -static standalone, -w silences warnings. Name it update, it goes unnoticed.

Evasion: encryption kills signatures, syscalls bypass hooks, injection into legitimate software deceives heuristics. In tests, 0 detections in Windows Defender AV.

How to Improve It for EDRs

Okay, if you want to take this to a professional level and stand up to tough EDRs like CrowdStrike or SentinelOne, the current loader with syscalls and AES is fine if we like mediocrity and confuse the concept of AVs and EDRs, but against memory scans and advanced telemetry, you need more layers. Tools like Freeze try to automate it, but do it manually for total control. Forget simple loaders and go for top-tier techniques. Let me explain:

  • Forget CreateThread/CreateRemoteThread: Opt for Pool Party or Timer Queue Injection CreateRemoteThread is a classic, but EDRs monitor it like crazy (kernel telemetry). Pool Party uses worker threads from the Windows thread pool (via TpAllocWork and TpPostWork) to execute code without creating new threads – it looks like normal system activity. Timer Queue Injection queues timers (NtSetTimer) that run callbacks on existing threads, super stealthy. Both avoid obvious thread creation events, reducing alerts. Implement with syscalls like NtCreateTimer or TpAllocTimer for more stealth.

  • Implement Stack Spoofing: The King of Evasion If your shellcode sleeps (as with Ekko, which obfuscates sleeps by rotating keys), but without spoofing the stack, a CrowdStrike memory scan will catch you: they see the call stack pointing to your malicious code. Stack spoofing fakes the stack frame, making it look like the thread is sleeping in legitimate ntdll or kernel32. Separate the children from the adults because it requires manually manipulating the stack pointer (RSP) and frames. Integrate libs like Unwinder (to unwind the real stack and spoof it) or check out AceLdr (an open-source loader that does it with syscalls). Basic: save the real stack, create a fake one with returns to benign APIs, sleep, restore. Against EDRs with memory hunting, it's gold.

  • Custom Reflective Loaders (UDRL): Goodbye Generic Donut Stub The Donut stub is cool, but generic – signatures catch it. Write your own User-Mode Reflective DLL Loader (UDRL): load DLLs into memory without disk, resolve imports dynamically, and then delete PE headers (Header Stomping: overwrite the PE header with zeros or noise). This way, memory scans don't see suspicious PE structures. Learn from sRDI or custom: parse PE in mem, relocate, resolve IAT with hashing (see below), execute entrypoint, stomp headers. More stealth, less static/dynamic detection.

Your setup already has syscalls (bypass user-mode hooks) and AES (obfuscate payload). To beat Freeze (which attempts stack spoofing and more), implement manually.

circle-info

(I use freeze as an example because it was very good at the time):

A. API Hashing (Hide Imports) Now, in PEStudio or CFF Explorer, you see clear imports: kernel32 -> VirtualAlloc, etc. – red flag. Solution: do not use strings such as ‘NtAllocateVirtualMemory’ in GetProcAddress, which leave readable text. Calculate hash (e.g., djb2 or ROT13) of the name: hash = 0; for char in name: hash = ((hash << 5) + hash) + char. Then, iterate exports from ntdll in mem (parse PEB -> LDR -> modules), hash each export and compare. Find the function without strings – clean binary, no suspicious strings in .data/.rdata. Add salt to the hash for uniqueness.

B. Stack Spoofing (Detailed) As above, but in more detail: in sleeps, EDRs such as CrowdStrike use APIs such as StackWalk64 to trace calls. Spoof: 1) Use RtlCaptureContext to capture current context. 2) Modify RSP to a fake allocated stack, with frames pointing to ntdll!RtlUserThreadStart or similar (looks like normal thread). 3) Sleeps (NtDelayExecution). 4) Restores real context. Libs such as Unwinder help with safe unwind/rewind, preventing crashes. Study AceLdr: use syscalls for everything, integrate into your loader for implant sleeps. Against ETW/AMSI, combine it with hardware patches already in Havoc.

Video with evidence

Please forgive me for having to upload it this way, but Gitbook does not allow videos to be embedded normally.

file-download
25MB

Conclusions

And that's it, a stylish evasion loader, perfect for learning or simulating threats. If you improve it further, you'll beat the EDRs.

I hope you like it, and I WANT A PANDAAAAAA!

Última actualización