Fuzzing confused dependencies with Depfuzzer

Rédigé par Pierre Martin , Kévin Schouteeten - 25/09/2024 - dans Outils - Téléchargement

In the landscape of software development, leveraging open-source libraries and packages through registries like NPM, PyPI, Go modules, and Crates for Rust has become standard practice. This approach facilitates the rapid integration of diverse functionalities into applications, driving both innovation and efficiency across the development community. While the benefits of using these resources are clear, the management of external dependencies introduces a set of considerations regarding security and maintainability.

Inspired by Alex Birsan's blogpost on dependency confusion, we will introduce the DepFuzzer tool, which facilitates the identification of failing dependencies across multiple projects.

INTRODUCTION

Today, developers predominantly work with languages relying on extensive libraries, such as Python, Node.js, Rust, and Golang. To simplify their installation process, language maintainers have established registries consolidating third-party libraries and allowing developers to share their work with the broader community.

However, this system inherently presents several design vulnerabilities which can expose users, developers, and companies to potential security compromises—a risk that has been highlighted by multiple incidents throughout 2023 and 2024.

This article explores package registries, the CLI tools used to interact with them, and their underlying mechanisms. We will then introduce Depfuzzer, a tool designed to automate the detection of dependency confusion vulnerabilities in package files.

PACKAGE REGISTRIES

How do they work?

Package registries primarily depend on the contribution of third parties to provide packages. Each package is linked to a specific individual through their account on the registry. Once a package name is registered, it becomes unique to that individual, and no third party can publish another package with the same name. Below is a table summarizing the package registries for various programming languages:

Node.js https://registry.npmjs.org
Python https://pypi.org/
Golang https://proxy.golang.org
Rust https://crates.io

It is important to note that developers can use local or remote packages from sources other than these official registries.

Within these package registries, maintainers have the ability to perform several key actions such as updating, publishing or unpublishing and deleting a package.

Security

Package registries are designed to prevent the publication of packages containing malicious code. To safeguard developers and users, packages are often analyzed in sandboxes to ensure that no harmful code is present, providing a layer of protection.

However, there is a significant security gap in package management. If a maintainer decides to completely remove their package or unpublish all versions (thus giving up ownership), anyone can subsequently claim this package and publish it with any code they choose. This creates a potential vulnerability, as the new package could contain malicious content.

PACKAGING FILE FORMATS

Node.js

Node.js uses a package manager called NPM (Node Package Manager) and relying on a JSON file named package.json. Below is an example, but a fully detailed description of such a file can be found here.

{
  "name": "test",
  "description": "A test package",
  "version": "1.0.0",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "repository": {
    "type": "git",
    "url": "https://github.com/fakegithubusername/test.git"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "bugs": {
    "url": "https://github.com/fakegithubusername/test/issues"
  },
  "homepage": "https://github.com/fakegithubusername/test",
  "dependencies":{},
  "devDependencies":{}
}

In this file, two keys are of particular interest: dependencies and devDependencies. These specify the dependencies that NPM will fetch, either remotely or locally, to run the project. A dependency can be declared in several ways:

  • mylibrary: ^1.0.0: fetch the mylibrary dependency and retrieve the most recent version between 1.0.0 and 2.0.0 (not included).
  • mylibrary: 1.0.0: fetch the mylibrary dependency and retrieve version 1.0.0.
  • mylibrary: >=1.2.0 <2.0.0: fetch the mylibrary dependency and retrieve the most recent version between 1.2.0 and 2.0.0.
  • mylibrary: git+https://github.com/fakeusername/mylibrary.git: fetch the mylibrary dependency from a specific GitHub repository.
  • mylibrary: file:../packages/mylibrary/: fetch the mylibrary dependency locally from a specific path.

Here, the dependency can be retrieved remotely or locally. In fact, there is another key in the JSON file that specifies how NPM should act in these cases:

{
  "workspaces": {
    "packages": [
      "packages/*"
    ]
  }
}

The workspaces key instructs NPM to first check the packages folder to see if the dependency exists locally. If it is found there, NPM will use it, otherwise it will check if the dependency is declared in the remote package registry.

Python

In Python, dependency management is often handled by two popular tools: Pip and Poetry. Both tools provide effective solutions for managing libraries and packages, but utilize different approaches and files.

Pip is the standard package installer for Python, primarily managing dependencies through a file called requirements.txt. This file lists all dependencies along with their specific version or version constraints. Here is a simple example of what this file might look like:

flask==1.1.2
requests>=2.24.0
numpy<=1.19.5
pandas

In this example:

  • flask==1.1.2 ensures that version 1.1.2 of flask is installed.
  • requests>=2.24.0 allows any version of requests that is 2.24.0 or newer.
  • numpy<=1.19.5 restricts the installation to versions of numpy that are 1.19.5 or older.
  • pandas installs the latest version available of pandas without specifying a version constraint.

In addition to the standard requirements.txt file, some Python projects use a separate file called requirements-dev.txt to manage development-specific dependencies. This approach helps keeping production dependencies separate from those required only during development, such as testing frameworks, linters or documentation tools.

On the other hand, Poetry offers a more integrated approach to package management and virtual environments. It uses a file called pyproject.toml, which not only handles dependency management but also project configuration and packaging. This file allows for a more detailed specification of dependencies, including version constraints and Python compatibility, ensuring that projects remain stable and predictable across installations. pyproject.toml simplifies project setup and maintenance by centralizing all configuration settings, thus eliminating the need for multiple configuration files. Here is an example:

[tool.poetry]
name = "example_project"
version = "0.1.0"
description = "An example Python project"
authors = ["Your Name <you@example.com>"]

[tool.poetry.dependencies]
python = "^3.8"
Flask = "^1.1.2"
requests = ">=2.24.0"
numpy = "<=1.19.5"
pandas = "*"

[tool.poetry.dev-dependencies]
pytest = "^6.0"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

In this pyproject.toml file:

  • The [tool.poetry] section specifies basic project metadata.
  • The [tool.poetry.dependencies] section lists production dependencies, with version constraints similar to those in the requirements.txt example.
  • The [tool.poetry.dev-dependencies] section lists dependencies needed for development, like pytest for running tests.
  • The [build-system] specifies the tools and versions required to build the project using Poetry.

Rust

Cargo is the Rust package manager and build system, and utilizes a registry to host and distribute Rust packages, known as crates. The Cargo registry is a centralized repository where Rust packages (crates) are published and shared and the default public registry is crates.io. When you publish a crate, it becomes available to all Rust developers using Cargo.

To use a crate from the registry, similarly to other package managers, you need to specify the dependency (or dependencies) in a file within your project called Cargo.toml. There are several ways to specify a dependency in this file:

[package]
name = "super_project"
version = "0.1.0"
authors = ["Synacktiv <developer@localhost>"]
edition = "2024"

[dependencies]
# Latest version 1.x.x
tokio = "1"

# Exact version
tokio = "=0.8.3"

# Version range
tokio = ">=1.0, <2.0"

# Pre-release version
tokio = "1.0.0-beta.2"

# Git dependency
asuperlib = { git = "https://github.com/superdev/asuperlib", branch = "master" }

# Local dependency
mylocalib = { path = "../mylocalib" }

Golang

Golang manages its package system differently from other languages: there is no official package registry, and most packages are retrieved from public GitHub repositories. Golang uses a file named go.mod, where developers can specify dependencies to be fetched from various sources:

module synacktiv.com/superproject

go 1.18

require (
    github.com/superlib/lib v1.7.4
)

replace (
    github.com/superlib/lib v1.7.4 => github.com/superlib/lib v1.7.3
    synacktiv.com/oldmodule => synacktiv.com/newmodule v1.0.0
    synacktiv.com/localmodule => ../localmodule
)

exclude synacktiv.com/oldlibrary/library v1.7.0

Packages in Golang can be retrieved from various sources such as github.com, golang.org, and others. It is mandatory to specify a version to be pulled for each package. The go.mod file also includes a specific syntax, such as the replace directive, which allows you to upgrade or downgrade versions defined earlier, or to specify libraries that can be found locally. Additionally, the exclude directive allows preventing a specific package from being installed.

In Golang, it is also possible to directly install binaries by compiling projects from their URLs using the command line, but will not be covered in this article.

Exploring the Node.js dependency installation process

When a Node.js project installs its dependencies, the process is driven by the content of the package.json file. This file acts as the blueprint, outlining all the dependencies and scripts necessary for the project to function. Let's take a closer look at how NPM handles this task using the following example:

{
  "name": "super-project",
  "version": "1.0.0",
  "description": "A super project from Synacktiv",
  "main": "index.js",
  "scripts": {
    "start": "node index.js"
  },
  "dependencies": {
    "express": "^4.17.1",
    "mongoose": "^5.11.15",
    "local-package": "file:../local-package",
    "http-package": "http://synacktiv.com/path/to/package.tgz",
    "github-package": "github:synacktiv/package",
    "github-package-with-branch": "github:synacktiv/package#master",
    "github-package-with-tag": "github:synacktiv/package#production",
    "github-package-with-commit": "github:synacktiv/package#239ea651b7ce7c6de79e1b55c54be58aa6818380"
  },
  "devDependencies": {
    "nodemon": "^2.0.7"
  }
}

Parsing the package.json file

The installation process begins with NPM parsing the package.json file to gather information about the project. It identifies the key components such as the project’s name, version, main entry point, scripts, and most importantly, the list of dependencies. The latter are divided into two main categories: dependencies for the core packages required in production, and devDependencies for tools needed during development, such as nodemon in this case.

Leveraging the package-lock.json file

Once the dependencies are identified, NPM looks for a package-lock.json file. This file is crucial because it ensures consistency by locking the exact versions of all dependencies and their nested dependencies. Its presence helps preventing issues that might arise from different environments using slightly different versions of the same packages. Moreover, it adds a layer of security by verifying the integrity of the packages against previously stored checksums, which helps mitigating risks associated with potentially compromised packages.

Utilizing the local cache

To make the installation process more efficient, NPM checks whether the required dependencies are already available in the local cache. If one is found in the cache, NPM uses it directly, bypassing the need to download it again. This approach not only accelerates the process but also enhances security by reducing the possibility of introducing a malicious package from an external source.

Resolving dependencies from various sources

NPM then proceeds to resolve and install each dependency according to its source, following a specific hierarchy. First, it handles local dependencies, such as local-package, which is referenced via a file path. Next, NPM retrieves dependencies hosted on external servers via HTTP, like http-package. After that, it resolves packages from GitHub repositories, which can be specified with a branch, tag, or commit, as seen with the various github-package entries. Finally, packages are fetched from the NPM registry, which is the most common scenario for packages like express and mongoose.

Handling missing packages

When NPM is unable to locate a specified package in the local file system, via HTTP, or on GitHub, it automatically falls back to searching the NPM registry. If the package exists in the registry, NPM retrieves and installs it to ensure all dependencies are satisfied.

However, this fallback mechanism can introduce significant security risks. During our missions, we have frequently encountered situations where simple typos in package names or the accidental deletion of the .npmrc file—often used to configure the mapping between packages and internal or external registries—have led to serious vulnerabilities. The .npmrc file is critical for directing NPM to the correct registry, and its absence or misconfiguration can cause NPM to pull packages from unintended, potentially untrusted sources.

This scenario opens the door to "dependency confusion" attacks, where attackers publish malicious packages with names similar to internal ones, hoping they will be inadvertently installed by systems that fall back to the public registry.

This is precisely the kind of vulnerability that DepFuzzer was designed to address. DepFuzzer scans projects to check whether the specified dependencies are present on public repositories, helping to ensure dependencies are being sourced from the correct locations and are not exposed to potential supply chain attacks.

DepFuzzer: An Automatic Analyzer of Packaging Files, Dependencies, and Sub-Dependencies

How DepFuzzer works

The operation of DepFuzzer is quite straightforward: the tool lists the various dependencies of the projects found in the different dependencies files we presented earlier.

Next, for each dependency, the tool checks its presence on the deps.dev website. The latter is a comprehensive public database maintained by Google that provides detailed information on various software dependencies across multiple ecosystems. It supports a wide range of programming languages and package managers, including:

  • NPM (JavaScript/Node.js)
  • PyPI (Python)
  • Maven (Java)
  • Go Modules (Golang)
  • Cargo (Rust)
  • NuGet (C#/.NET)

The platform aggregates and analyzes metadata from these package managers to offer insights into dependency usage, version histories, security vulnerabilities, and licensing information. By checking dependencies against deps.dev, DepFuzzer can determine whether a dependency is publicly available or not.

Example of using DepFuzzer

Suppose you have the following package.json file for a Node.js project:

{
  "name": "dependency-confusion-demo",
  "version": "1.0.0",
  "description": "A demo to illustrate dependency confusion",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "dependencies": {
    "@int-synacktiv/private-package": "^1.0.0",
    "lodash": "^4.17.21"
  },
  "author": "Scouty",
  "license": "ISC"
}

This file includes two dependencies: @int-synacktiv/private-package, which appears to be a private package, and lodash, a well-known public package.

In addition to the package.json file, there is the file called .npmrc, which tells NPM where to locate the company's internal registry. For instance, the .npmrc file might include the following line:

@int-synacktiv:registry=http://localhost:4873

This configuration directs NPM to fetch any packages scoped under @int-synacktiv from the internal registry located at http://localhost:4873.

When the NPM install command is executed, NPM will use this configuration to retrieve the @int-synacktiv/private-package from the internal registry as specified in the .npmrc file:

$ npm install
added 2 packages, and audited 3 packages in 2s
found 0 vulnerabilities

If the .npmrc file were to be deleted—whether due to a misconfiguration, environment migration, or any other reason—the behavior of the NPM install command would change. Without this file, NPM would no longer know to fetch the package from the internal registry. Instead, it would attempt to retrieve the package from the public registry (npmjs.org).

In such a case, if the package @int-synacktiv/private-package does not exist on the public registry, the NPM install command would result in an error like the following:

$ npm install
npm error code E404
npm error 404 Not Found - GET https://registry.npmjs.org/@int-synacktiv%2fprivate-package - Not found
npm error 404
npm error 404 '@int-synacktiv/private-package@^1.0.0' is not in this registry.
npm error 404
npm error 404 Note that you can also install from a
npm error 404 tarball, folder, http url, or git url.
npm error A complete log of this run can be found in: /home/user/.npm/_logs/2024-07-04T11_59_45_530Z-debug-0.log

This error indicates that NPM tried to retrieve the @int-synacktiv/private-package from the public registry (npmjs.org) and failed because the package is not available there. If we run the DepFuzzer tool simultaneously, we would see the following output:

$ python3 main.py --provider npm --path ~/dependency-confusion-demo/
[+] Processing repositories for npm
[+] Found 2 npm dependencies
[+] Starting analysis for npm...
[+] @int-synacktiv/private-package@^1.0.0 might be taken over!

As an attacker, the goal is to exploit the misconfiguration by publishing a malicious package with the same name and organization as the original, which has not been created on the public registry. This can be achieved with the following command:

$ npm publish --access public
npm publish --access public
npm notice
npm notice @int-synacktiv/private-package@1.0.0
[...]
npm notice integrity: sha512-lBrWveA9TtBjj[...]BttKY9k4LfuSw==
npm notice total files: 2
npm notice
npm notice Publishing to https://registry.npmjs.org/ with tag latest and public access + @int-synacktiv/private-package@1.0.0

This command publishes a malicious package under the same name, @int-synacktiv/private-package, to the public registry. Since the package did not previously exist on the public registry, the attacker can successfully register it.

When the victim, unaware of the attack, installs their project on a new environment with the misconfiguration (e.g., missing or incorrect .npmrc file), NPM will no longer produce warnings or errors. Instead, it will retrieve and install the attacker's malicious package from the public registry, believing it to be the legitimate package:

$ npm install
added 2 packages, and audited 2 packages in 2s
found 0 vulnerabilities

The installation completes without issue, and no red flags are raised. However, the package now contains malicious code controlled by the attacker, rather than the legitimate code expected by the victim. This could lead to a range of harmful outcomes, depending on the attacker's intentions, such as data theft, remote code execution, or further compromise of the victim's systems.

Email takeover

Another technique for taking control of a package involves examining the email addresses of the project's owners. When a package is published on platforms like NPM or PyPI, the owner's email address is included and publicly accessible. This creates an opportunity for attackers to retrieve these email addresses and determine whether they are still active.

An attacker could potentially take ownership of a package if any of the following conditions are met:

  • The email address uses a temporary email service (e.g., YOPmail).
  • The email address has been deleted on a known provider (e.g., Gmail).
  • The domain associated with the email address is no longer registered or has expired.
  • The password for the email account has been leaked and is accessible to the attacker.

This method of taking control can be particularly insidious, as the attacker could subtly insert malicious code into a new version of the package, disguising it as a legitimate enhancement or update. This type of attack is especially dangerous because it can go unnoticed, allowing the malicious code to be distributed widely before anyone realizes the package has been compromised.

DepFuzzer is also equipped to check for this specific scenario. By analyzing the email addresses associated with package ownership, it can help identifying potential vulnerabilities related to removed email accounts.

Mitigations

To counter these attacks, several mitigations can be implemented. One effective strategy is to enforce the use of an internal registry that mirrors the public registry (e.g., npmjs.org). By doing so, you can ensure that all dependencies are sourced from a controlled environment, which significantly reduces the risk of dependency confusion attacks. This approach ensures that even if a similarly named malicious package is published on the public registry, it will not be mistakenly pulled into your project.

Additionally, the maintainers of public registries like NPM and PyPI are actively working to contain such malicious packages. They employ automated systems that analyze the code within each published package to detect potential threats. If a package is found to be malicious, the registry typically responds by removing both the package and the associated account, thus mitigating the spread of the threat.

Conclusion

In conclusion, dependency confusion attacks are becoming increasingly common, especially as the use of external dependencies has become essential in modern software development.

DepFuzzer can serve as an initial safeguard, helping to identify potential dependency confusion issues within a project. However, no tool can offer a 100% guarantee of security, and additional vigilance is always required.

On the attacker side, it is crucial to understand the risks associated with exploiting this type of vulnerability. Publishing a malicious package can have unintended consequences, such as breaking build environments or causing malicious code to run on servers that were not originally targeted. This can lead to widespread damage and unintended victims, so extreme caution and ethical considerations must be exercised when dealing with such vulnerabilities.

Looking ahead, future improvements to DepFuzzer will include support for additional package managers such as NuGet (C#/.NET) and Maven (Java). Additionally, work will be done to reduce the number of false positives, ensuring that the tool provides more accurate and reliable results.

The project can be found on GitHub at https://github.com/synacktiv/DepFuzzer.