What is Format String Vulnerability? Ways to Exploit, Examples and Impact
Memory corruption vulnerabilities have long been the cornerstone of low-level exploitation. Among these, the format string vulnerability stands out as a classic yet devastating class of bug that occurs when an application improperly filters user input passed into functions like printf(). In this guide, we will explore the mechanics of format string attacks, how they allow attackers to leak data or gain control over a system, and how to prevent them.
What are Format Strings?
To understand the vulnerability, we must first understand the tool. In C and C-based languages, format strings are templates used by the printf family of functions to produce formatted output. These functions are variadic, meaning they can accept a variable number of arguments.
Common format string functions include:
printf(): Prints to the standard output (stdout).fprintf(): Prints to a file stream.sprintf(): Prints to a string buffer.snprintf(): Prints to a string buffer with a size limit (safer).syslog(): Used for system logging.
These functions use "format specifiers" to tell the compiler how to interpret the data being passed. Common specifiers include:
%dor%i: Signed decimal integer.%u: Unsigned decimal integer.%x: Hexadecimal representation.%s: String (reads from a memory address).%p: Pointer (displays a memory address).%n: A unique specifier that writes the number of characters printed so far to a variable.
In a standard, secure call, the developer defines the format string:printf("The score is: %d\n", score);
Here, the program knows exactly what to expect: one integer.
The Root Cause of Format String Vulnerabilities
A format string vulnerability occurs when a developer passes user-controlled input directly as the format string argument itself, rather than as data to be formatted.
Consider this vulnerable code snippet:
#include <stdio.h>
int main(int argc, char *argv[]) {
if (argc > 1) {
// VULNERABLE: user input is passed directly as the format string
printf(argv[1]);
}
return 0;
}
In this scenario, if a user provides the input "Hello World", the program prints Hello World. However, if the user provides %x %x %x %x, the printf function looks for four additional arguments on the stack. Since the developer didn't provide any, the function continues to read whatever happens to be on the stack at that moment, leaking raw memory contents to the user.
How the Vulnerability Works Under the Hood
When a function like printf is called, the arguments are pushed onto the stack (in x86) or passed via registers (in x64). The function maintains an internal pointer to the current argument it is processing. Every time it encounters a % symbol in the format string, it increments this pointer to the next expected argument location.
If the format string is "Value: %d, Address: %p", printf expects two values following the string. If the attacker provides a string containing more specifiers than there are actual arguments, printf blindly marches down the stack, interpreting stack frames, return addresses, and local variables as data to be printed. This is the essence of the vulnerability: the function trusts the format string to define the number of arguments, but the attacker controls the format string.
Exploitation Techniques
Exploiting format string bugs generally falls into three categories: information disclosure, crashing the application, and arbitrary memory writes.
1. Information Leakage (Reading the Stack)
An attacker can use the %p or %x specifiers to dump data from the stack. This is often the first step in a more complex attack, such as bypassing Address Space Layout Randomization (ASLR).
Example Payload:./vulnerable_app "%p.%p.%p.%p.%p.%p.%p.%p"
This would output a series of hex addresses. Some of these might be pointers to the heap, others might be return addresses (pointing to the code segment), and some might be sensitive data like local variables or "canaries" used for buffer overflow protection.
2. Reading Arbitrary Memory
The %s specifier is particularly dangerous. It tells printf to treat the value on the stack as a memory address and print the string located at that address. If an attacker can place a specific address on the stack (often by including it in the format string itself) and then use a %s specifier to point to it, they can read any memory location the process has access to.
Example Scenario:
If an attacker knows the address of a global variable containing a password or a secret key, they can craft a payload that places that address on the stack and uses %s to print the secret.
3. Writing to Arbitrary Memory (The %n Specifier)
The most powerful aspect of format string vulnerabilities is the ability to write to memory using %n. Unlike other specifiers that read data, %n takes a pointer to an integer and writes the number of characters printed before it into that address.
int count;
printf("12345%n", &count);
// count now equals 5
By controlling the number of characters printed (using width modifiers like %100c), an attacker can write an arbitrary value to an arbitrary address.
The Attack Path:
- Find a target address to overwrite (e.g., a function pointer in the Global Offset Table, or a return address on the stack).
- Calculate the value needed to redirect execution (e.g., the address of a
system("/bin/sh")call or a shellcode buffer). - Craft a format string that prints exactly that many characters and uses
%nto write the value to the target address.
Advanced Exploitation: Positional Arguments
Modern C libraries support positional arguments, which make exploitation much easier. Instead of typing %p fifty times to reach a specific stack offset, an attacker can use %n$p, where n is the index of the argument.
For example, %7$p will print the 7th argument on the stack directly. This allows for precise targeting of memory locations without cluttering the payload with unnecessary characters.
Real-World Impact
The impact of a format string vulnerability ranges from minor to catastrophic:
- Denial of Service (DoS): An attacker can provide an invalid address to a
%sspecifier, causing the program to access unmapped memory and crash (Segmentation Fault). - Data Exfiltration: Leaking sensitive environment variables, cryptographic keys, or database credentials stored in memory.
- Privilege Escalation: By overwriting a UID variable or a boolean
is_adminflag in memory, an attacker can gain elevated permissions. - Remote Code Execution (RCE): By overwriting a function pointer (like those in the Global Offset Table) or a return address, an attacker can redirect the program flow to execute malicious shellcode.
Mitigation and Prevention
Preventing format string vulnerabilities is straightforward but requires disciplined coding practices.
1. Always Use a Static Format String
Never pass user input as the first argument to a formatting function.
Wrong:printf(user_input);
Right:printf("%s", user_input);
By explicitly defining the format string as "%s", the compiler and the runtime treat the user input strictly as data, and any % symbols within the input will be printed literally rather than interpreted as specifiers.
2. Compiler Warnings and Security Flags
Modern compilers can detect many format string bugs at compile time. Use the following flags with gcc or clang:
-Wformat: Checks calls toprintf,scanf, etc., to ensure that the format string is a string literal and that the arguments provided have the correct types.-Wformat-security: Warns about calls to format functions where the format string is not a string literal and there are no format arguments. This specifically catches theprintf(argv[1])pattern.-D_FORTIFY_SOURCE=2: Enables various compile-time and run-time checks, including checks for dangerous format string usage.
3. Use Safer Alternatives
In many cases, you don't need the full power of printf. If you are just printing a string, use puts() or fputs(). If you are building a string, use snprintf() which requires you to specify a buffer size, preventing related buffer overflow issues.
4. Static and Dynamic Analysis
Regularly scan your codebase with Static Application Security Testing (SAST) tools. These tools are highly effective at flagging instances where variables are used as format strings. Additionally, dynamic analysis and fuzzing can help identify crashes that result from malformed format strings during runtime.
Conclusion
Format string vulnerabilities are a potent reminder that trusting user input can lead to total system compromise. While they are less common in modern high-level languages like Python or JavaScript, they remain a critical concern for C/C++ applications, embedded systems, and legacy software. By adhering to the simple rule of never using user-controlled data as a format string, developers can entirely eliminate this class of vulnerability.
To proactively monitor your organization's external attack surface and catch exposures before attackers do, try Jsmon.