So lately I have been dabbling a lot in memory corruption exploits and thought I would take this weekend off from doing wargames to write a quick post about Windows memory management before delving into challenges that require bypassing stack protection mechanisms such as NX/DEP, stack cookies/canaries and ASLR. We’ll save that for another post. 🙂
By default, 32-bit applications in Windows are allocated 4 GB of virtual address space, 2GB of which are reserved for kernel space, and the other 2 GB of which are available to the process as user space. The user space can grow to 3GB if its large address space aware flag is set and if the system is booted with 4-gigabyte tuning (4GT) enabled. Virtual address spaces are divided into “chunks” called pages and are mapped to “chunks” of physical address spaces called memory frames or page frames. Note that although the virtual pages are contiguous, the physical frames they are mapped to are discontiguous.
Each virtual page address is composed of a page number and an offset.
The most significant bits of the address specify the page number and the least significant specify the offset. Similarly, each frame address is divided into a frame number and an offset. If we were to use a home address as an analogy, you could think of the page number as the street name of the house and the offset as the address number. Pages exist in 4 states: reserved, shareable, committed or free. Committed and shareable pages are both mapped to memory frames, but the difference is shareable pages are accessible by other processes while committed pages are private and can not be accesses by other processes. In order to commit pages, Windows calls several VirtualAlloc function which reserve virtual address space before committing them at run-time.
OK, so we know what virtual pages and page frames are. But how are the virtual addresses mapped to the physical addresses? The answer is by using page tables. When we load a process into virtual memory, we need to make sure the process accesses the correct physical frames rather than the pages it thinks it is accessing inside its virtual memory abstraction. We accomplish this by using a page table which is a data structure located inside the kernel memory space in RAM that provides mappings between page numbers and frame numbers. When a process consults the page table to request an address translation, it looks up the page number in the table and swaps it with the corresponding frame number in its entry. The offset is left unchanged. The final physical address consists of the frame number and the unchanged offset.
How is all of the above accomplished in Windows? The answer is through the Windows memory manager. Windows memory manager exists in the Windows executive inside the Windows NT Operating Systems Kernel image (ntoskrnl.exe):
Windows memory manager (WMM) has 3 main tasks:
1. Translating/mapping a process’s virtual address space into physical memory whenever a process reads or writes to virtual address space.
2. Paging contents of physical memory to the disk when it becomes overcommitted, or when processes require more memory than is available in the RAM.
3. Protecting processes from corrupting the address space of other processes.
In addition, Windows memory manager must be both efficient and reentrant. For the uninitiated, being reentrant entails being able to synchronize access to resources shared among different threads via some sort of mechanism such as a spin lock. Some of these resources include kernel memory pools, memory lists, page tables and ASLR structures.
One of the ways WMM maintains efficiency is by utilizing a special CPU cache known as the translation look-aside buffer (TLB). One of the problems with using page tables is that they limit the efficiency of virtual to physical address translations. Because page tables are located in RAM, each time a process requests an address translation, it actually performs 2 physical memory accesses. The first is a page table access and the second is a data access. The translation look-aside buffer addresses this issue by caching the most recently used address lookups. So instead of consulting the page table, a process consults the TLB first to see if its entry exists. If it does, it is considered a TLB hit. If it does not, it is considered a TLB miss. A typical TLB will have a hitrate of over 99% rendering it a very effective mechanism for speeding up address translation.
The WMM must also protect different process from each other so that they may not read or write to another process’s memory without the correct permissions. An example of one such case in which a process may have the correct permissions is when a parent process wants to alter the virtual memory of its child process. WMM employs several default hardware-controlled memory protection options as well as ACLs to enforce access control to shared memory objects.