Protostar Heap3 Walkthrough

Up until now we have seen how we can leverage buffer overflow vulnerabilities to perform stack-based memory corruption exploits (hence the name “SmashTheStack”). But what about the heap? Can it be exploited too to achieve arbitrary code execution? Absolutely.

Admittedly, generating heap overflows is a lot more challenging than generating stack overflows, or at least it was for me when I did this exercise because you have to think about how to leverage doubly-linked lists in order to redirect the flow of execution instead of just trying to overwrite the return address (EIP) of a stack frame. Before proceeding, I highly recommend reading Vudo Malloc Tricks by MaXX (section 3.6) and Exploiting the Wilderness by Phantasmal Phantasmagoria as they present overviews of heap overflows that are extremely pertinent to this exercise.

The heap is used for dynamic allocation and every OS has its own heap allocator. So the way you exploit a heap allocator is dependent on the implementation of the heap allocator on the system or program. Protostar Heap3 specifically uses a general purpose memory allocator called Dlmalloc (Doug Lea’s Malloc) and the goal of the exercise it to essentially break it. For the uninitiated, I will present a brief overview of how Dlmalloc works as understanding it is critical to completing this exercise. If you would like a more comprehensive description of how it works I would recommend you peruse Dr. Lea’s original article.

Dlmalloc perceives the heap as a series of different chunks. The last chunk at the highest address of the heap is a special chunk called the wilderness chunk. The address at the very end of the heap, or the top of the wilderness chunk, is known as the program break. The heap can expand if necessary by calling brk() and sbrk() to increase the value of the program break. Each chunk can either be allocated or free. Allocated chunks look like the following (taken from MaXX’s article):

    chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
             | prev_size: size of the previous chunk, in bytes (used   |
             | by dlmalloc only if this previous chunk is free)        |
             +---------------------------------------------------------+
             | size: size of the chunk (the number of bytes between    |
             | "chunk" and "nextchunk") and 2 bits status information  |
      mem -> +---------------------------------------------------------+
             | fd: not used by dlmalloc because "chunk" is allocated   |
             | (user data therefore starts here)                       |
             + - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
             | bk: not used by dlmalloc because "chunk" is allocated   |
             | (there may be user data here)                           |
             + - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
             |                                                         .
             .                                                         .
             . user data (may be 0 bytes long)                         .
             .                                                         .
             .                                                         |
nextchunk -> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
             | prev_size: not used by dlmalloc because "chunk" is      |
             | allocated (may hold user data, to decrease wastage)     |
             +---------------------------------------------------------+

And free chunks look like this:

    chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
             | prev_size: may hold user data (indeed, since "chunk" is |
             | free, the previous chunk is necessarily allocated)      |
             +---------------------------------------------------------+
             | size: size of the chunk (the number of bytes between    |
             | "chunk" and "nextchunk") and 2 bits status information  |
      mem -> +---------------------------------------------------------+
             | fd: forward pointer to the next chunk in the circular   |
             | doubly-linked list (not to the next _physical_ chunk)   |
             +---------------------------------------------------------+
             | bk: back pointer to the previous chunk in the circular  |
             | doubly-linked list (not the previous _physical_ chunk)  |
             +---------------------------------------------------------+
             |                                                         .
             .                                                         .
             . unused space (may be 0 bytes long)                      .
             .                                                         .
             .                                                         |
nextchunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
             | prev_size: size of "chunk", in bytes (used by dlmalloc  |
             | because this previous chunk is free)                    |
             +---------------------------------------------------------+

As we can see, each chunk regardless of if it is free or not, is composed of an area that can store user data and another area that stores four 4-byte fields of metadata information (prev_size, size, fd, bk). Mem is simply the address that’s returned to the user when a Malloc() function is called to dynamically allocate memory. If a chunk is free then Dlmalloc uses a process called binning to store its free chunks in doubly linked lists and so each free chunk’s fd and bk pointer point to the next and previous free chunk in its doubly linked list. Bins exist in memory as an array of pointers to a linked list. The fd and bk fields are only used if the chunk itself is free. If it is allocated, then user data can be used in place of the fd and bk pointers. Also the previous size field of a chunk is only used if the previous chunk is free. Otherwise the previous chunk can use it to store user data.

An important detail to remember is that the lowest bit of the size field specifies whether or not the previous chunk is allocated or free. If it is set to 1, then the previous chunk is allocated. If it is set to 0, then the previous chunk is free. Similarly, the next lowest bit of the size field specifies whether or not the chunk was allocated via mmap.

This paragraph is very important so pay attention! When a chunk is freed, if it was allocated via mmap, then it will call the munmap_chunk() function. Otherwise, it will call the chunk_free() function. Don’t worry, you only need to know how the latter works for this exercise. Within the chunk_free() function, another function, unlink(), is called if its previous chunk is free in order to take the previous chunk off its doubly linked list and coalesce it with the chunk being freed! Also note that if two free chunks are found to be contiguous to each other, the heap allocator will coalesce these two free chunks into one larger heap chunk to support defragmentation. This implies that you will never find two free contiguous chunks.

Take a look at the following important parts of the Chunk_free() and the Unlink() macro before proceeding.

Important parts of the Chunk_free() function:

  INTERNAL_SIZE_T hd = p->size;
   ...
   if (!hd & PREV_INUSE))     /* consolidate backward */    /* [A] */
   {
     prevsz = p->prev_size;
     p = chunk_at_offset(p, -(long)prevsz);                 /* [B] */
     sz += prevsz;

     if (p->fd == last_remainder(ar_ptr))
       islr = 1;
     else
       unlink(p, bck, fwd);
   }


Unlink() macro:

#define unlink( P, BK, FD ) {            \
    BK = P->bk;                          \
    FD = P->fd;                          \
    FD->bk = BK;                         \
    BK->fd = FD;                         \
}

Remember what the Unlink() macro does because it will be very important to understand when we craft our exploit!

Now let’s begin the actual exercise.

For Protostar Heap3 we are given a binary and from the exploit-exercises website we are given the following source code:

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <stdio.h>

void winner()
{
        printf("that wasn't too bad now, was it? @ %d\n", time(NULL));
}


int main(int argc, char **argv)
{
        char *a, *b, *c;

        a = malloc(32);
        b = malloc(32);
        c = malloc(32);

        strcpy(a, argv[1]);
        strcpy(b, argv[2]);
        strcpy(c, argv[3]);

        free(c);
        free(b);
        free(a);

        printf("dynamite failed?\n");
}

The program dynamically allocates 32 bytes of memory for each buffer, string copies 3 arguments into the buffers and subsequently frees each one before calling a print statement kindly reminding us the “dynamite failed”. Nice. Instead of calling this print statement, what we want to do is redirect our thread of execution to point to the winner function so we can print out that message. Ok so how do we go about doing that?

For this specific exercise, we will be exploiting the free() function in order to build our payload. We will craft a fake chunk inside buffer B so that when buffer B is freed, it mistakenly thinks that its previous chunk is free and calls the unlink() function on it to remove the fake chunk from the doubly linked list. Why does it remove the fake chunk from the doubly linked list? Because remember, when a chunk is freed and its contiguous chunk is also free, the two chunks are coalesced into one larger free chunk.

In order to get our buffer B to think its previous chunk is our fake chunk, we must first trick it into thinking it’s previous chunk is free. Otherwise, remember, it won’t process its prev_size field if the previous chunk is allocated! We trick it into thinking the previous chunk is free by setting the lowest bit of its 4-byte size field to 0. We do this by setting it to a negative number. For this exercise let’s set buffer B’s size to -4 (0xfffffffc) because not only is its least significant bit set to 0, but we also want it to think that the size field of its contiguous chunk is 4 bytes before it in buffer B’s prev_size field instead of 32 bytes before it in buffer A’s size field. Next, we must set the prev_size field to a negative value because if you look at the Chunk_free() function again, dlmalloc calculates p by calling chunk_at_offset(p, -(long)prevsz). Basically it will traverse the inverse value of the prev_size in order to get the location the previous chunk starts at. In our case, let’s set it to -8 so that it believes the previous chunk starts at an offset of 8 bytes in front of where chunk B starts. This will be the starting location of our fake chunk, or P.

Using the unlink() function we can set P to be equal to the beginning of our fake chunk, BK to be the address of our NOP sled, and FD to be the address of the puts() function minus 12 bytes. I’ll explain why we set FD to that address shortly. Please refer to the source code for the unlink() macro above if this isn’t clear! When we craft our exploit we have to remember that the BK pointer exists at a 12 byte offset from the beginning of a chunk and that the FD pointer exists at an 8 byte offset from the beginning of the chunk. So we will need to fill in 8 bytes of filler data to reach our FD pointer. In our case let’s just use the classic hex speak 0xdeadbeef twice.

As we’ve previously seen, when corrupting the stack, generally the goal is to take control of EIP and make it point to some address it isn’t supposed to point to. The Global Offset Table (GOT) is a table of function pointers in the process address space that point to where imported functions are located. At compile time, each entry in the GOT is set to 0. However, at run time each entry points to its function’s address in the libc library. Each process has its own GOT table! We can leverage this table in our exploit by overwriting an entry in the GOT table to make its pointer point to somewhere it’s not supposed to point. In our case, as you will see, we will make it point to an address in our NOP sled so that it will go down each NOP instruction until it hits our shell code. For now, let’s take a look at our program’s GOT table.

GOT

The function we want to overwrite is puts, NOT printf. Why? Don’t we call printf in our program? Yes, but when you don’t pass in any additional arguments that begin with the modulo sign then the compiler calls puts instead of printf because it is faster and more efficient. Also, we must remember to subtract 12 bytes from the address shown in the GOT table because the Unlink() function sets FD->bk = BK and the bk field is located at a 12 byte offset from the beginning of a chunk!

0x0804b128-0xc = 0x0804b11c

So the actual address we will be using is 0x0804b11c.

What about the address of our NOP sled? How do we get that? By simply setting a breakpoint and examining the registers.

Address of Buffer A

Looks like we’ll be directing our pointer to address 0x804c008.

The shell code we will use is generated from a simple assembly program:

push 0x08048864
ret

0x08048864 is the address of the winner() function and I extracted the opcodes the same way I did in the previous exercise – by assembling the program and running objdump on the object file to extract the opcodes. This is the resulting shellcode I come up with:

\x68\x64\x88\x04\x08\xc3

Here’s a quick diagram I drew that presents a general overview of our exploit:

heap3 Diagram
Buffer A = [NOP sled] + [shellcode] + [12 filler bytes] + [modified prev_size header of Buffer B (-8)] + [modified size header of Buffer B (-4)]
Buffer B = [8 filler bytes] + [address of GOT entry – 12 bytes] + [address of NOP sled]
Buffer C = [C]

So in Buffer A we will first place our 14 byte long NOP sled followed by our 6 byte long shellcode, 12 bytes of filler data to reach the end of our first chunk, and an extra 8 bytes of data to overflow our Buffer B’s header. These extra 8 bytes will corrupt the 2nd chunk’s metadata information by changing the prev_size to -8 (0xfffffff8) and the size to -4 (0xfffffffc). This tricks dlmalloc into processing our fake chunk with unlink() allowing us to redirect the program’s flow of execution to our NOP sled and therefore our shellcode, when the print statement is called.

Here’s the complete payload we will use:

`python -c 'print “\x90”*14 + "\x68\x64\x88\x04\x08\xc3" + “A”*12 + "\xf8\xff\xff\xff" + "\xfc\xff\xff\xff”'` `python -c 'print "\xde\xad\xbe\xef"*2+"\x1c\xb1\x04\x08"+"\x08\xc0\x04\x08”` C

And the result when we run it with the program:
My Heap3 Solution

Ta-da! Our first heap exploit!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s