Buffer Overflow Exploit

Introduction

Welcome to the Buffer Overflow Lab. In this assignment, you will explore the memory layout of a running process and learn how careless handling of user input can lead to severe security vulnerabilities. By exploiting a buffer overflow bug in a provided binary, you will learn how to hijack the control flow of a program.

Through this exercise, you will deepen your understanding of the RV32 architecture, stack frames, and the RISC-V calling convention. You will progress through three stages of exploitation:

Overwriting a return address to redirect execution.
Injecting and executing your own custom machine code.
Using Return-Oriented Programming (ROP) to bypass modern execution protections.

Getting the Starter Code

Download and extract the bufexploit.tar.

cd cs224
wget --no-check-certificate https://cs224.cs.vassar.edu/labs/bufexploit.tar
tar xvf bufexploit.tar
cd bufexploit

In this directory you’ll find several files:

ctarget – a binary with a buffer overflow bug used for a code injection attack
rtarget – a binary with a buffer overflow bug used for an ROP attack
hex2ascii – a program for converting a string of bytes to their raw ascii
run.sh – a shell script to run your exploit
debug.sh – a shell script to run your exploit in gdb
gdbinit – gdb commands you want to run at the start of every debugging session
getobjs.sh – a shell script to assemble a .S file into object code representation
inject.S – assembly code you want to convert to machine code for your code injection attack
level1.txt – textfile containing the bytes of our target1 exploit
level2.txt – textfile containing the bytes of our target2 exploit
level3.txt – textfile containing the bytes of our target3 exploit

Overview of `ctarget` and `rtarget`

The ctarget (code injection target) binary has a buffer overflow bug in the get_string function, which acts like the vulnerable gets function discussed in lecture. The get_input function uses get_string to get a string input from the user, and thus is vulnerable to a buffer overflow exploit.

For the first part of this lab, you will craft an input exploit string to overwrite the return address in get_input to get that function to return to target1 instead of main.

Then, you will figure out how to perform a code injection attack to call target2 with the input 224 (by writing code that places 224 in the a0 register). For this part of the lab, you can use the getobjs.sh script to convert your assembly instructions to the correct byte sequence.

Finally, for the last part of the lab, you will craft an exploit string for the rtarget (ROP target) binary that uses an ROP (Return-Oriented Programming) attack to return to target3 with the value 1337.

Both binaries share the same source code, but in the rtarget binary, the stack is non-executable, so code injection attacks will no longer work.

The relevant source code for ctarget and rtarget is shown below.

#include <stdio.h>
#include <stdlib.h>

// Checks whether value matches the expected target number for a given level.
// value should equal target for the check to pass. 
extern void validate(int level, int value, int target);

// Stores input from stdin in buf.
// Length of buf is not checked!!!!
// Warning!  This function should never be used!!!
void get_string(char *buf) {
    int c = getchar();
    while (c != EOF && c != '\n') {
        *buf = c;   // store a character in buffer
        buf++;      // move to next character in buffer
        c = getchar();
    }
    *buf = '\0';    // NULL terminate the string
}

// ctarget: Overwrite the return address in the call to get_input to
// return to this target.
void target1(void) {
    printf("Level1: target1 called.\n");
    validate(1, 0, 0);
    exit(1);
}

// ctarget: Overwrite the return address in the call to get_input to
// return to this target with the input of 224.
void target2(int value) {
    printf("Level2: target2 called.\n");
    validate(2, value, 224);
}

// rtarget: Must be called via a ROP chain that places 0x539
// (1337) into a0 using only the gadgets provided (gadget_1 and
// gadget_2).  The stack is non-executable in rtarget, so a code
// injection attack cannot be used to call this function.
void target3(int value) {
    printf("Level3: target3 called.\n");
    validate(3, value, 0x539);
}

void get_input(void) {
    char buf[20];
    get_string(buf);
}

int main(void) {
    printf("Type a string:\n");
    get_input();
    printf("Function returned normally. Exploit failed!\n");
    exit(-1);
}

Level 1: Overwriting the Return Address

You can see that get_string has a buffer overflow bug, which you will take advantage of in the call to get_input to craft an exploit to return to target1 instead of returning normally. To do this, you will need the address of target1. The objdump command will disassemble a binary.

riscv32-none-elf-objdump -d ctarget > ctarget.s

That line will disassemble the ctarget binary and save the assembly in the file ctarget.s, which you can examine to find the address of target1.

Writing the Exploit

To write a successful exploit, you will need to examine the assembly code for get_input and understand the stack layout of this function. In particular, you will need to discover where the old value of the ra (return address) register is stored on the stack, and where the input buffer begins.

You will use this information to overflow the buffer, overwriting the saved ra value on the stack with the address of target1. For your exploit to be successful, you need to convert the bytes you want to place onto the stack into their corresponding ascii characters. You could look up the correct character for a given byte manually, but unfortunately, many of those bytes represent unprintable characters.

To help with your exploit, we have provided you with a tool, hex2ascii, which will take a sequence of hex bytes and convert them into an ascii string suitable for inputting as an exploit. Let’s try it to make an exploit string that puts the following bytes onto the stack: 48 65 6C 6C 6F 20 57 6F 72 6C 64 21.

Create a file called exploit.txt that looks like this:

48 65 6C 6C // this is a comment
6F 20 57 6F   
72 6C 64 21 /* so is this */

You can turn these bytes into a raw input string by running the following:

./hex2ascii exploit.txt > exploit.raw

In this example, all the bytes are printable, so you can look at the exploit.raw file to see the output string, but in general, this will not be the case.

cat exploit.raw

What is the value of the input string?

There are also two scripts to help you run and debug your code. The first script, run.sh, will perform the hex2ascii conversion on an input hex file and run the binary with the correct input.

./run.sh ctarget level1.txt

The other program will run your code in the debugger. Note that this script takes care of running QEMU in the background and GDB in the foreground, all in one window.

./debug.sh ctarget level1.txt

This debug script also reads in a file called gdbinit. This file contains gdb commands that are run automatically when gdb is started. You can edit this file to automatically set breakpoints.

Level 2: Code Injection Exploits

Now that you have successfully redirected control flow, it is time to execute your own instructions. In a code injection attack, the attacker provides a malicious input string that actually contains executable machine code (often referred to as “shellcode”).

Because get_string does not check the bounds of buf, you can write your custom RV32 instructions directly into memory on the stack. You will then overflow the buffer just like in Level 1, but this time, you will overwrite the saved return address (ra) to point to the memory address of the beginning of your injected code on the stack. When get_input finishes and executes its ret instruction, the CPU will jump to your code and begin executing it.

For this level, your goal is to inject code that passes an argument to target2. Specifically, you must write assembly instructions that place the integer value 224 into the a0 register (following the RISC-V calling convention) and then jump to the address of target2.

For your code injection attack, you will need the raw byte representations of the object code for the assembly instructions you want to inject. The easiest way to do this is with the getobjs.sh shell script. It takes a .S file, assembles it into a .o file, and extracts the raw bytes for you. Put your assembly instructions into inject.S and use the script to get your bytecode.

When you have an exploit that successfully returns to target2 with the value of 224 in register a0, the output from your program will indicate success.

Level 3: Return-Oriented Programming (ROP)

The third exploit will use the program rtarget. It is mostly the same source code as ctarget, but the stack has been made non-executable, meaning code injection attacks will no longer work. For this level, you will have to chain the two provided gadgets, gadget_1 and gadget_2, together and call target3 with an input of 1337 (0x539).

./run.sh rtarget level3.txt

Background

In Levels 1 and 2, you overwrote a saved return address to redirect execution to an existing function or your own injected code. Level 2 also required that the target function receive the right argument, which you handled by writing instructions to place the value into a0.

Modern systems deploy W⊕X (Write XOR Execute, also called NX / “no-execute”) protection, which prevents the CPU from executing anything you wrote into a data region. This defeats the classic “inject shellcode into the buffer” technique you used in Level 2.

Return-Oriented Programming (ROP) is the answer. Instead of injecting new code, you chain together short snippets of existing code in the binary — called gadgets — that each end with a ret instruction. By carefully arranging addresses and data on the stack, you can steer these gadgets to perform arbitrary computation.

The Gadget Farm

Two gadgets live in the rtarget binary: gadget_1 and gadget_2.

000102f4 <gadget_1>:
    lw    a1,0(sp)
    lw    ra,4(sp)
    addi  sp,sp,16
    ret

00010304 <gadget_2>:
    mv    a0,a1
    lw    ra,0(sp)
    addi  sp,sp,16
    ret

target3 takes its argument in a0 per the RISC-V calling convention. You cannot place 0x539 into a0 directly; you must use a combination of these gadgets and carefully placed data on the stack to achieve this before jumping to target3.

Hints

Draw out the stack on a sheet of paper. Figure out where the input buffer is being placed on the stack and where the return address is being stored.
In the exploit file you create (e.g., level1.txt), memory increases as you go downward in the file, but in the stack, increasing memory moves upwards on the stack.
Do not put the byte 0A in your input! This is the ascii code for \n, the newline character. This byte will terminate your input when is being read by get_string. The program hex2ascii will warn you if you have a newline in your input. Fortunately, you do not need this byte in any of your exploits.
RV32 instructions are stored in memory as 4-byte integers in little endian format
A helpful gdb command to examine the stack is x/12xw $sp. If you are in the get_input function, this will show the entire 48 byte stack frame.
For your code injection attack, it is tricky to generate the proper bytes for the call (jal) opcode because it requires a label. You have a couple of options. You can place the address of the target into the ra register and then ret. You can also use the jalr (jump and link register) instruction. Finally, since you don’t plan on returning to this function, you can simply use the jr (jump register) instruction.
For the ROP exploit, track where the stack pointer (sp) goes after each gadget executes addi sp, sp, 16.
When you are executing the gadgets, the stack frame is restored (sp has the value of sp before main called get_input), so your exploit will have to overwrite the stack frame for main. This is okay, because we never intend to return to main.
Go slow. Take each step one at a time and verify each step works (e.g., successfully jumping to gadget_1).

Submitting Your Code

Upload your level1.txt, level2.txt, and level3.txt to Gradescope. Only upload those three text files.

Good luck!