While BUILDING.txt includes a lot of hints about what the various options are to build Apache Hadoop, it is overwhelming to turn those directions into something that can actually be deployed. In fact, even many regular contributors to the project don’t even know how the Apache Software Foundation builds a release!
Inside the dev-support/bin directory, Apache Hadoop features a helper utility called create-release
that removes the guesswork. Built to help project release managers simplify their tasks, you can take advantage of it to make your release process easier, too.
Default Behavior
By default, just running create-release
from any directory in the source repository will do a few things:
- Verify the repository passes ASF license requirements
- Build the Java components without any native (read: C/C++ compiled) parts
- Build the website and all the documentation, including generating the release notes from the ASF JIRA system
- Provide MD5 checksums to verify that transfers match
After finishing (which will take a while!), the built artifacts will be in the target/artifacts
directory.
$ ls -1 hadoop/target/artifacts
CHANGES.md
CHANGES.md.md5
RELEASENOTES.md
RELEASENOTES.md.md5
hadoop-3.0.0-alpha2-SNAPSHOT-rat.txt
hadoop-3.0.0-alpha2-SNAPSHOT-rat.txt.md5
hadoop-3.0.0-alpha2-SNAPSHOT-site.tar.gz
hadoop-3.0.0-alpha2-SNAPSHOT-site.tar.gz.md5
hadoop-3.0.0-alpha2-SNAPSHOT-src.tar.gz
hadoop-3.0.0-alpha2-SNAPSHOT-src.tar.gz.md5
hadoop-3.0.0-alpha2-SNAPSHOT.tar.gz
hadoop-3.0.0-alpha2-SNAPSHOT.tar.gz.md5
The artifacts should work everywhere Java works. create-release
, however, can do fancier builds with more features and some optimizations. Using the --help
flag, we see the following:
$ ./create-release --help
--artifactsdir=[path] Path to use to store release bits
--asfrelease Make an ASF release
--docker Use Hadoop's Dockerfile for guaranteed environment
--dockercache Use a Docker-private maven cache
--logdir=[path] Path to store logs
--mvncache=[path] Path to the maven cache to use
--native Also build the native components
--rc-label=[label] Add this label to the builds
--sign Use .gnupg dir to sign the artifacts and jars
--version=[version] Use an alternative version string
If something goes wrong, check the logs stored in the patchprocess
directory:
$ ls -1 patchprocess/*log
patchprocess/mvn_apache_rat.log
patchprocess/mvn_clean.log
patchprocess/mvn_install.log
patchprocess/mvn_site.log
Building the Native Components
Adding the OS-specific code to your release is usually the first priority. There are several components, but the two big ones are libhadoop.so
(libhadoop.dylib
on Mac OS X) and the container-executor
. The former is a JNI library that enables a lot of extra functionality as well as faster versions of some features that are also implemented in Java. The latter enables the LinuxContainerExecutor
functionality that, despite the name, actually works on most Unix operating systems to provide significantly better security.
For a successful compile of the native components, you’ll need to make sure your build environment has all the necessary prerequisites. (See BUILDING.txt for more information). If you are building on Linux, however, see more information later on about using Docker to make that easier.
Changing Locations of Things
You can tell create-release
to use different directories for certain operations. The --logdir
and --artifactsdir
options are fairly self-explanatory. But what is the --mvncache
option?
Apache Hadoop uses maven as its build tool. Maven downloads Java dependencies as it is compiling. When it does this, it stores them into a local cache. This local cache, however, has a huge gotcha: there is no locking. This means that multiple maven executions may collide due to the sharing of this cache. The --mvncache
option allows you to tell create-release
to use a different directory for its cache. This makes the tool safe to use for concurrent maven runs.
Signing Your Build
If gpg and gpg-agent are available, the --sign
option will also sign the jars and artifacts. This is especially good if you follow-up the build with a mvn deploy
to upload your build into something like Artifactory, Nexus, etc.
Changing Versions
Out of the box, Apache Hadoop typically encodes the version as x.y.z-SNAPSHOT, where x.y.z is the release currently under development in that particular branch. Under many circumstances, that is not ideal for a variety of reasons. Passing the --version
flag will allow you to override that string with something else. That version string is also compiled into the source such that hadoop version
will report it. The --rc-label
changes the names of the tarballs so that they also have an extension. Putting these together, create-release --version=3.0.0-EM --rc-label=-RC1
will create Hadoop v3.0.0-EM stored in a tarball called hadoop-3.0.0-EM-RC1.tar.gz.
Docker Support
Getting all the pre-requisites for building Hadoop’s OS-native features is a time-consuming process. Luckily, a Dockerfile has shipped with Apache Hadoop for over a year now that is up-to-date with everything you need to build those features. create-release
can take advantage of that file by passing the --docker
option. This will run create-release
in a Docker container built from that Dockerfile. Any options that reference other directory paths, e.g., --logdir
, will get mounted inside the container so that they work as expected.
With the help of Docker, create-release
can also guarantee a “fresh” cache. This is a useful exercise to guarantee that all dependencies are downloadable. The --dockercache
option forces using a fresh maven cache directory.
Apache Hadoop Release Management
One of the key reasons create-release
exists is to help ASF Apache Hadoop Release Managers offer a consistent build environment over time. This includes setting the proper flags so that create-release
is itself consistent. The --asfrelease
flag makes it so that ASF RM’s won’t have to research or memorize exactly how to make the release. It also includes some extra capabilities such as verifying that a signed release has a valid public key in the ASF master repositories. So while using the --asfrelease
flag might be tempting, (unless you are a committer for the Apache Hadoop project) this flag probably won’t work for you. 🙂
Summary
create-release
is an easy way to build Apache Hadoop. It provides tooling to guarantee a consistent environment and build parameters. While built for ASF usage, you can also use it to get the most out of your local installation.
I’m guessing that the docker option only works if you are using an Intel x86 compatible CPU, so users of ARM and Power CPUs are on their own. Am I right?
Using the built-in Dockerfile, at least as of today, it is only on x86. But one could replace the Dockerfile prior to running create-release and it should work just fine. (It’s unfortunate that it isn’t really possible to make a multi-platform Dockerfile due to limitations in the file format.)