What is Regular Expression Denial of Service (ReDoS)? Ways to Exploit, Examples and Impact

Discover how ReDoS attacks exploit catastrophic backtracking in regex engines. Learn to identify, exploit, and prevent ReDoS with practical examples.

What is Regular Expression Denial of Service (ReDoS)? Ways to Exploit, Examples and Impact

Regular expressions (regex) are the unsung heroes of modern software development, powering everything from simple input validation to complex log analysis. However, when poorly constructed, these powerful patterns can become a significant security liability known as a Regular Expression Denial of Service (ReDoS). This algorithmic complexity attack targets the regex engine itself, forcing it into a state of "catastrophic backtracking" that consumes 100% of a server's CPU resources, effectively bringing applications to a grinding halt.

Understanding the Mechanics of Regex Engines

To understand why ReDoS occurs, we must first look at how regular expression engines process strings. There are two primary types of engines: Deterministic Finite Automaton (DFA) and Nondeterministic Finite Automaton (NFA).

Most modern programming languages—including JavaScript, Python, Ruby, Java, and PHP—utilize NFA engines. NFAs are popular because they support advanced features like backreferences and lookarounds. However, the NFA engine is "backtracking-based." When an NFA engine encounters a potential match that fails later in the string, it "backtracks" to a previous decision point and tries a different path. While this works well for simple patterns, certain complex patterns can lead to an exponential number of paths for the engine to explore when faced with a non-matching string.

What is Catastrophic Backtracking?

Catastrophic backtracking is the root cause of ReDoS. It occurs when a regular expression contains nested quantifiers or overlapping groups that allow for an astronomical number of ways to match a specific input. As the input string grows linearly, the time required for the engine to determine that the string does not match grows exponentially.

Consider the following "evil" regex pattern:

(a+)+$

This pattern looks simple: it matches one or more 'a' characters, grouped together, repeated one or more times, followed by the end of the string. Now, consider the input aaaaaaaaaaaaaaaaaaaaaaaaaaaaab (29 'a's followed by a 'b').

The engine starts matching the 'a's. When it reaches the 'b', it realizes it cannot match the $ (end of string). Because of the nested quantifiers (a+)+, the engine begins to backtrack. It tries to group the 29 'a's in every possible permutation:

  • One group of 29
  • Two groups (1 and 28, 2 and 27, etc.)
  • Three groups...

For an input of length n, the number of combinations is 2^n. With just 30 characters, the engine might need to perform over a billion calculations. On a modern CPU, this could take seconds; with 50 characters, it could take years. During this time, the CPU thread handling the request is locked at 100% utilization.

Common Vulnerable Regex Patterns

Identifying ReDoS vulnerabilities requires recognizing patterns that lead to ambiguity. In the world of Jsmon and infrastructure reconnaissance, spotting these patterns in client-side code or API endpoints is a critical step in assessing attack surfaces. Here are the most common culprits:

1. Nested Quantifiers

As seen in the previous example, patterns like (a+)*, (a*)*, or (a|b+)* are extremely dangerous. When a quantifier (like *, +, or {n,}) is placed inside another quantifier, the complexity explodes.

2. Overlapping Disjunctions

When a regex offers multiple paths that can match the same substring, the engine becomes confused during backtracking.

Example:

(a|a)+$

If the input is aaaaa!, the engine can match each a using either the left or right side of the pipe (|). This creates 2^n paths to explore before the engine finally gives up at the !.

3. Quantified Repetitions with Overlapping Suffixes

Patterns where a repeated group is followed by something that could also have been matched by that group are risky.

Example:

^([a-zA-Z0-9]+\s?)*$

This pattern is intended to match multiple words separated by optional spaces. However, because \s? is optional and [a-zA-Z0-9]+ matches alphanumeric characters, a long string of alphanumeric characters without spaces will cause the engine to try every possible way to break the string into "words."

Exploiting ReDoS: Practical Examples

Attackers exploit ReDoS by identifying endpoints that process user-supplied input through a regular expression. This often includes search bars, login fields (email validation), and header parsing logic.

Example: The Email Validator

Many developers use complex regex for email validation. Consider this simplified vulnerable pattern:

^([a-zA-Z0-9._-]+)+@([a-zA-Z0-9.-]+)+$

An attacker could send a specially crafted payload to a registration endpoint:

Payload: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

Since the payload lacks the @ symbol, the engine will attempt to match the first part of the regex ([a-zA-Z0-9._-]+)+. It will exhaustively backtrack through all permutations of the 'a' characters before failing. If the server processes this on the main event loop (as in Node.js), the entire application becomes unresponsive to other users.

Example: WAF Bypass and ReDoS

Web Application Firewalls (WAFs) often use regular expressions to detect malicious payloads. Ironically, this makes the WAF itself a target for ReDoS. If an attacker identifies the regex used by a WAF for SQL injection detection, they can craft a payload that doesn't actually contain a SQL injection but triggers catastrophic backtracking in the WAF's inspection engine, effectively bypassing the security layer or crashing the WAF node.

Real-World Impact: The Cloudflare Outage

Perhaps the most famous instance of ReDoS occurred in July 2019, when Cloudflare experienced a global outage. The cause was a single misconfigured regular expression in a WAF rule designed to detect cross-site scripting (XSS).

The regex was:

(?:(?:\"|'|]|\}|\\\\|\\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\.\w+)|[0-9]+|\s+|[\{\}\[\]\(\)\,\.\:\;]|\b(?:back|case|catch|const|continue|debugger|default|delete|do|else|export|extends|finally|for|function|if|import|in|instanceof|new|return|switch|this|throw|try|typeof|var|void|while|with|yield)\b|.*(?:.*=.*))

The culprit was the final part: .*(?:.*=.*). The nested .* caused CPU spikes to 100% across Cloudflare's global network, demonstrating that even the most sophisticated infrastructure can be brought down by a few characters of poorly written regex.

How to Identify ReDoS Vulnerabilities

Security professionals use several methods to find ReDoS bugs:

  1. Static Analysis: Tools like rxxr2 or eslint-plugin-security can scan source code for potentially dangerous patterns.
  2. Dynamic Testing (Fuzzing): Sending long, repetitive strings to input fields and monitoring the server's response time. If a 20-character string takes 10ms and a 25-character string takes 500ms, a ReDoS vulnerability is likely present.
  3. Manual Code Review: Looking for nested quantifiers and overlapping groups in configuration files, WAF rules, and input validation logic.

When using Jsmon for infrastructure reconnaissance, discovering exposed JavaScript files can reveal the regular expressions used for client-side validation, which are often mirrored on the server-side, providing a roadmap for potential ReDoS attacks.

Mitigation and Prevention Strategies

Preventing ReDoS requires a combination of better coding practices and architectural safeguards.

1. Use Non-Backtracking Engines

If your environment allows, use a regex engine that guarantees linear time complexity, such as Google's RE2. RE2 uses an automaton-based approach instead of backtracking, making it immune to ReDoS by design.

2. Set Execution Timeouts

Most modern regex libraries allow you to set a timeout. If the engine takes longer than a few milliseconds to find a match, it should abort the operation.

In .NET, this is straightforward:

var regex = new Regex(@"(a+)+$", RegexOptions.None, TimeSpan.FromMilliseconds(100));

3. Avoid Nested Quantifiers and Overlap

Refactor your regular expressions to be more specific. Instead of (a+)+, use a+. Instead of .*, use more restrictive character classes like [a-zA-Z0-9]+ that do not overlap with other parts of your pattern.

4. Input Length Validation

ReDoS attacks require long strings to trigger exponential growth. By enforcing a strict maximum length on user input before passing it to a regex engine, you can limit the maximum time the engine can spend on a match.

5. Use Atomic Grouping

Atomic grouping (?>...) prevents the engine from backtracking into the group once a match has been found. This effectively "locks in" the match and eliminates the possibility of catastrophic backtracking within that group.

Conclusion

Regular Expression Denial of Service is a subtle but devastating vulnerability. It transforms a tool meant for efficiency into a weapon for resource exhaustion. As applications become more complex and rely heavily on pattern matching for security and data processing, understanding the mechanics of ReDoS is essential for every developer and security researcher.

By prioritizing simpler regex patterns, implementing timeouts, and utilizing non-backtracking engines, organizations can defend against these attacks. To proactively monitor your organization's external attack surface and catch exposures before attackers do, try Jsmon.