What is Length Extension Attack? Ways to Exploit, Examples and Impact

What is Length Extension Attack? Ways to Exploit, Examples and Impact

In the realm of cryptography, hash functions are often viewed as digital signatures that provide data integrity. However, many developers unknowingly implement hashing in a way that leaves their applications wide open to a sophisticated yet often overlooked technique called a Length Extension Attack. If you have ever used a simple secret-prefix construction like Hash(secret + message) to authenticate data, you might be at risk. This article dives deep into the mechanics of length extension attacks, explains why certain algorithms are vulnerable, and provides practical examples of how they are exploited in the real world.

Understanding the Foundation: How Hash Functions Work

To understand why a length extension attack is possible, we must first look at how popular hash functions like MD5, SHA-1, and the SHA-2 family (SHA-256, SHA-512) are built. Most of these algorithms utilize the Merkle-Damgård construction. This design processes input data in fixed-size blocks (e.g., 512 bits or 1024 bits).

The Merkle-Damgård Construction

The Merkle-Damgård construction follows a specific workflow to turn a variable-length message into a fixed-length hash:

  1. Padding: Since messages rarely fit perfectly into fixed-size blocks, the algorithm adds padding. This usually involves appending a '1' bit, followed by a series of '0' bits, and finally the length of the original message.
  2. Initialization Vector (IV): The process starts with a predefined set of constants known as the Initialization Vector.
  3. Compression Function: Each block of the message is processed sequentially. The output of the compression function for the first block becomes the input (or the "state") for the next block.
  4. Final Output: Once the last block is processed, the resulting state is the final hash value.

Crucially, the hash value you see is simply the internal state of the algorithm after the final block has been processed. This characteristic is the "Achilles' heel" that makes length extension attacks possible.

The Core Vulnerability: Secret Prefix Construction

A common mistake developers make when trying to create a Message Authentication Code (MAC) is using a "Secret Prefix" construction. The logic seems sound: if we want to ensure a message hasn't been tampered with, we can concatenate a secret key with the data and hash the result.

Token = Hash(SecretKey + Message)

The theory is that since an attacker doesn't know the SecretKey, they cannot generate a valid Token for a modified Message. However, because of the Merkle-Damgård construction, if an attacker knows the Hash(SecretKey + Message) and the length of the SecretKey, they can calculate Hash(SecretKey + Message + Padding + ExtraData) without ever knowing the SecretKey itself.

Why Hash(secret + message) is Insecure

When the application calculates Hash(secret + message), the algorithm finishes by processing the last block (which includes the padding) and outputs the internal state. An attacker can take that output, treat it as the starting state for a new hashing session, and continue the hashing process with new data. Effectively, the attacker "extends" the length of the message.

How a Length Extension Attack Works (Step-by-Step)

Let’s break down the mechanics of the attack. Suppose an application uses SHA-1 and the secret-prefix method to sign API requests.

The Setup:

  • Secret Key: SECRET (6 characters)
  • Original Message: user=guest (10 characters)
  • Original Hash: ae4f... (The result of SHA1("SECRETuser=guest"))

An attacker wants to change user=guest to user=guest&role=admin. To do this, they must perform the following steps:

Step 1: Guessing the Secret Length

The attacker doesn't need the secret key, but they do need to know its length to calculate the correct padding. If the length is unknown, they can simply iterate (brute force) through possible lengths (e.g., 1 to 64 bytes). For each length, they generate a payload and test it against the application. If the application accepts the request, the attacker has found the correct length.

Step 2: Reconstructing the Original Padding

Hash functions require padding to align the message to block boundaries. For SHA-1, the padding for the message SECRETuser=guest (16 bytes total) would look something like this in hex:

80 00 00 00 ... 00 00 00 00 00 00 00 80
  • 0x80: The '1' bit followed by zeros in byte form.
  • 0x00: Null bytes to fill the block.
  • 0x80: The length of the original message in bits (16 bytes * 8 = 128 bits, which is 0x80 in hex).

Step 3: Extending the Hash

The attacker takes the original hash value (ae4f...) and uses it to initialize the state of their own SHA-1 engine. They then "feed" their malicious data (&role=admin) into the engine. The engine continues from where the previous hash left off. The resulting new hash is a valid signature for the following combined message:

SECRET + user=guest + [Padding] + &role=admin

The application, when it receives the request, will take the SecretKey, append the attacker's provided message (user=guest + [Padding] + &role=admin), and calculate the hash. Because the attacker correctly included the original padding in the middle of the string, the application's calculation will match the attacker's forged hash.

Practical Exploitation Example

Let's look at a concrete scenario involving a web application that uses MD5 to sign file download links.

Scenario: Forging an API Request

The application generates links like this:
https://api.example.com/download?file=report.pdf&signature=5d41402abc4b2a76b9719d911017c592

The signature is generated via MD5("APP_SECRET" + "file=report.pdf").

An attacker wants to download system.config. They know the original hash and they guess the APP_SECRET is 10 characters long. Using a tool like Jsmon to identify the infrastructure and endpoints, they identify this signing pattern.

Using HashPump for Automation

Manual calculation of padding is tedious. Most security professionals use a tool called hashpump. Here is how an attacker would generate the forged signature and the new payload:

# hashpump -s <original_hash> -d <original_data> -a <data_to_add> -k <key_length>
hashpump -s 5d41402abc4b2a76b9719d911017c592 -d "file=report.pdf" -a "&file=system.config" -k 10

The tool outputs:

  1. New Hash: a1b2c3d4...
  2. New Message: file=report.pdf\x80\x00\x00...\x78&file=system.config

The attacker then sends the request:
https://api.example.com/download?file=report.pdf%80%00...%78%26file%3Dsystem.config&signature=a1b2c3d4...

Because of how many web servers parse parameters, the second file parameter might overwrite the first, or the application might process the concatenated string in a way that allows access to the sensitive file.

Real-World Impact and Case Studies

Length extension attacks are not just theoretical. One of the most famous examples occurred in 2009 when security researchers discovered that the Flickr API was vulnerable. Flickr used a signed_api_key + arguments construction to authenticate API calls. This allowed anyone to take a signed request and append their own arguments, effectively gaining unauthorized access to user data and performing actions on behalf of users.

Similarly, various implementations of the Adobe RTMP protocol and certain older implementations of Amazon S3 signatures have faced scrutiny regarding how they handled message authentication, leading to more robust designs like AWS Signature Version 4.

The impact of these attacks includes:

  • Authentication Bypass: Gaining access to restricted accounts or administrative panels.
  • Data Tampering: Modifying transaction amounts, user roles, or file paths in signed requests.
  • Privilege Escalation: Elevating a standard user token to an administrator token.

How to Prevent Length Extension Attacks

Preventing this vulnerability is straightforward if you use the correct cryptographic primitives.

Switch to HMAC (Hash-based Message Authentication Code)

The industry standard for signing messages is HMAC. Unlike the simple secret-prefix method, HMAC uses a nested hashing approach:

HMAC(key, message) = Hash((key ^ opad) + Hash((key ^ ipad) + message))

By hashing the message twice with the key in a specific way, HMAC breaks the Merkle-Damgård property that allows for length extension. Even if you know the final HMAC value, you cannot extend it because the "outer" hash wraps the "inner" hash, hiding the internal state of the message processing. To proactively monitor your organization's external attack surface and catch exposures before attackers do, try Jsmon.

Use Modern Hashing Algorithms

Not all hash functions are vulnerable to length extension. Algorithms that do not use the Merkle-Damgård construction are inherently immune.

  • SHA-3 (Keccak): Uses a "Sponge construction." It is immune to length extension attacks by design because the internal state is much larger than the output hash.
  • BLAKE2: While extremely fast and secure, BLAKE2 also includes built-in protections against length extension.
  • SHA-512/256: This is a truncated version of SHA-512. Because the output is a truncated version of the internal state, an attacker cannot easily reconstruct the full state needed to extend the hash.

Conclusion

The length extension attack serves as a vital reminder that "rolling your own crypto" or using simple constructions can lead to disastrous security flaws. Even if a hash function like SHA-256 is mathematically strong, using it incorrectly in a Hash(secret + message) pattern renders it vulnerable to manipulation. By understanding the underlying Merkle-Damgård construction and switching to secure alternatives like HMAC or SHA-3, developers can ensure the integrity and authenticity of their data.

Regularly auditing your infrastructure for these types of cryptographic weaknesses is essential. To proactively monitor your organization's external attack surface and catch exposures before attackers do, try Jsmon.