What is the common thing between the WannaCry ransomware (2017), the Conficker worm (2008), and the Blaster worm (2003)? They all exploited buffer overflow vulnerabilities in Microsoft network services and protocols! Those vulnerabilities were outlined in the following Microsoft bulletins:
- MS17-010 – Security Update for Microsoft Windows SMB Server.
- MS08-067 – Vulnerability in Server Service Could Allow Remote Code Execution.
- MS03-026 – Buffer Overrun in RPC Interface Could Allow Code Execution.
Buffer overflow has been considered one of the most severe vulnerabilities that can possibly exist in an OS, process, network service, or any other executable program written mainly in C language. The reason is that this vulnerability can easily lead to arbitrary code execution, which is the ability of the malicious hacker to fully control the target vulnerable system by gaining a shell-like access.
The anatomy of buffer overflow attacks can be quickly sketched as follows:
- The program – a network service – reads an input from the user, such as a username, password, etc.
- A malicious hacker inserts large data; the input data must be larger than the variable size specified by the developer.
- The large data overflows the memory segment allocated for the input variable; the extra portion is spilled over adjacent memory cells, wiping off critical information.
- The program crashes at this point.
In order to gain a remote shell, the hacker needs to craft a special input. This specially-crafted input will overwrite adjacent memory cells with data that will be perfectly executed later by the CPU, tricking it to execute commands inserted by the malicious hacker.
The Instruction Pointer
The Instruction Pointer (IP) is a CPU register in the Intel Architecture (IA), and its main role is to point to the next instruction – in memory – to be executed. It was introduced in the 16-bit CPU series where each register was 16-bit in length. When the 32-bit architecture was invented, the IP became 32-bit in length, and was renamed Extended Instruction Pointer (EIP); later on, when the 64-bit architecture was introduced, it became 64-bit in length and was renamed to RIP.
Whether it is called, EIP, RIP, or simply IP, the role of this register is the same; and that is, it contains the memory address of the next instruction to be executed. Thus, when the current instruction finishes execution, the CPU checks the IP register, goes to the memory address as specified in that register, fetches the instruction there, increments the IP by one unit – 32bit, 64bit, etc. – so that that it points to the next one, and executes the recently fetched instruction. And the process repeats.
Typically, the IP is incremented by one unit to get the next instruction in memory; this means that the instructions are stored in consecutive memory cells with sequential addresses. However, there are times when the execution path needs to jump to another memory segment with a set of instructions to be executed, such as in the case of a function call. And at the end of executing that memory segment, the execution path needs to return to the address following the jump instruction. The following diagram illustrates this:
The execution path in the diagram above is like this:
1 => 2 => 3 => 11 => 12 => 13 => 4 => 5
Initially, memory cells 1, 2, and 3 are executed sequentially. However, since the instruction at cell 3 is a jump to memory cell 11, the execution will jump to cell 11 and execute the next instructions. At memory cell 13, there is a return instruction, and as such, the execution path will return to the location directly after the JUMP instruction and continue from there.
The question now is really this: when the moment of RETURN comes, how does the CPU know which address to go to? And the answer is: it is stored in the Stack!
The Buffer and the Stack
A buffer is simply a sequence of consecutive memory cells reserved for a certain variable. And it is generally part of a greater area called the Stack. But to grasp that, we need to get the full picture; every process or thread – in fact, every function of an executable program – is given a portion of memory allocated by the OS. This portion is divided into two major sections:
- The Code Section: it contains the actual instructions of the function or process.
- The Data Section: it contains all types of variables – static, dynamic, initialized, or uninitialized – along with other needed information.
The following diagram sketches the memory layout of a single function:
Part of the data section is the Stack, which is a LIFO (Last-In-First-Out) container. Particularly, the stack contains the return address, which is the instruction to be executed directly after the current function or process ends. This is the address directly after the instruction that made the jump or call to our current function. Aside from the return address, the stack contains variables that have fixed size, i.e., statically-allocated variables.
If one of those variables is read from the user, the program needs to make sure that the user-defined input is not greater in size than the allocated buffer.
What will happen if the input is greater and the program does nothing about it? It will overflow its buffer and overwrite adjacent buffers and probably overwrite the Return Address!
Let’s have a look at the image above again, and let’s assume that variable 2 was a username which the user enters at run-time. Let’s also assume that the programmer has allocated 50 characters (bytes) to be the size of variable 2. Thus, the OS reserves 50 bytes for variable 2 buffer. A C code snippet for this operation can look like this:
... char username; gets( username ); ...
This C code reads [gets()] the user input and places it in the username buffer. The code does not check the size of the input. It places it directly into the buffer as it is. This lack of input validation is a major factor in allowing a malicious hacker to exploit the code. To understand this, let’s look at the various scenarios that may happen:
- The user enters an input that is less than or equal to 50 characters. In this case, the whole input fits into the buffer. If we zoom into the stack now, this is how it will roughly look like if the user enters the input ABCDEFGHIJKLMNOPQRSTUVWXYZ:
- The user enters an input that is greater than 50 characters. And in this case, the extra characters will overwrite the adjacent cells. Visiting our previous example, the input will corrupt the buffer of variable 1; and if the input is large enough, it will even overwrite the Return Address. The following two diagrams show how the memory will look like if the user enters 52 and 78 characters, respectively:
What happens when the Return Address gets corrupted? Well, when the current function finishes execution, the CPU will fetch whatever value available in the Return Address cells, place that value in the EIP register, and then go and fetch the instruction that is located at that address. However, since the address is now corrupted (it is a random number), it is going to point to a part of memory that includes contents that are meaningless to the CPU. And this will cause the program to crash!
Crashing the program – or the service – means it is a denial of service (DoS) attack. However, this is not enough! We want to take advantage of this weakness and gain something more than just a DoS, and that is the art of exploitation.
But before we explain the process of exploitation, we need to know that the actual vulnerability exists in the first place due to a lack of input validation. That is, had the developer made an explicit check on the input size and accepted a maximum of 50 characters, the vulnerability would not have existed. A secure code would be something like this:
... int MAXLEN = 50; char username[MAXLEN]; fgets( username, MAXLEN, stdin ); ...
The above example uses the function fgets() which is secure compared to the function gets(). The latter does not restrict how many characters to read from the user, while the first specifies the maximum number of characters to read. Using fgets(), if the user enters more than 50 characters, only the first 50 are read. In C, there are some functions considered insecure in the sense that they don’t inherently provide input validation; thus, if the developer doesn’t explicitly validate the input, the program becomes vulnerable. It is always recommended to replace those insecure functions with alternative secure ones – those that have inherent input size validation. The following tables show some of the insecure functions and their alternative secure versions:
So far, we have seen that sending a large enough input can cause a denial of service. But how can we get a remote shell – command execution – on the target system? We get it by crafting a special input that achieves a certain purpose; that is, we need to trick the CPU into executing part of our input. The variable that holds our input is the only thing we have control over. We cannot control any portion or section of memory except the cells allocated for that variable; and since there is no boundary check, our input can extend beyond the allocated cells to adjacent memory cells.
For proper exploitation, our input needs to accomplish two goals:
- It must overwrite the Return Address (previous EIP) with a new address of our choice.
- It must contain some executable instructions that should eventually be executed by the CPU.
The classical way to accomplish the above goals is to make the new Return Address pointing at the beginning of the very buffer that holds our input; and of course, since our input now contains machine instructions, they will be executed. The following image shows how the stack will look like when you overflow the buffer with our crafted input – assuming that our buffer is at address 0x11223344:
What exactly is going to happen when our input is injected into the stack? After the stack is messed up by our crafted input, the function will continue executing as per the instructions in the Code Segment. Whenever there is a reference to one of the variables that have just be overwritten, irrelevant contents will be retrieved. But the real action takes place at the end of the function when the return instruction is executed!
When the time comes for the function to return, automatically the CPU retrieved the Return Address from the stack; but now instead of pointing to the previous calling function, the Return Address will tell the CPU to fetch the instruction at the start of the buffer. Our injected instructions will be fetched one by one and get executed, thus, performing the actions we wanted.
The most common instruction that hackers have been using since the discovery of buffer overflow vulnerabilities is an instruction to execute a shell process. In Windows environment, that would be executing C:\Windows\System32\cmd.exe, while in UNIX/Linux systems, that would be executing /bin/bash.
So, the crafted input will look – from a higher perspective – as follows:
exec(“/bin/bash”): Random Characters (NOPs): Address of the Buffer
In order to prevent buffer overflow attacks, vendors of OS’s and CPU’s invented different software and hardware mechanisms to make exploitation impossible or impractical. We have to emphasize here that the first and most important line of defense does not lie in those techniques, but rather in a proper validation of user inputs. In other words, it is the duty of the developer to develop secure programs from the beginning. The techniques we are talking about here can have some effects in case something was missing in the code itself. We should also note that those techniques make exploitation harder but not impossible. Each mitigation technique invented has been at some pointed circumvented, albeit bypassing the defense walls requires now harder work from the side of the hacker. The following are two popular mitigation techniques:
1. Data Execution Prevention (DEP)
This technique involves marking the Data segment of memory as Non-Executable (NX). Thus, the stack can now hold data only. Any attempt to execute a code residing in the stack will fail. And given that our crafted input – which includes instructions – will ultimately reside in the stack, our instructions will not execute. When the function returns and our inserted Return Address is inserted in the EIP register, the CPU is supposed to fetch and execute the instruction pointed to by that EIP (our value); however, since now the EIP points to the non-executable stack, an exception will happen.
The DEP measure can be circumvented using what is called Return-to-libc. Return-to-libc relies on the fact that when a program executes, it is most probably going to include functions from external libraries to which it is linked – statically or dynamically. For example, the functions gets() is part of stdio library. Which means that all functions included inside the stdio library are loaded in memory and can be called at run-time.
Instead of overwriting the Return Address with the address of the buffer (inside the stack), we can overwrite it with an address of a stdio function we want to execute. For example, we can call the function system(), and give it the parameter “bin/bash”.
2. Address Space Layout Randomization (ASLR)
The ASLR measure depends on randomizing the relative addresses of the memory cells every time the program runs. In short, this means that each time the vulnerable program runs, the buffer gets a new address. This makes it difficult to hardcode the address that will overwrite the Return Address.
One factor that used to make buffer overflow exploitation easy was the fact that each time the program runs on the same environment, the memory cells containing the Code and Data get always the same relative addresses. A hacker who reverse engineers the program can know exactly what the address of the buffer is, and then use it to overwrite the Return Address. However, with ASLR, this becomes impossible since the address of the buffer cannot be known beforehand.