While BUILDING.txt includes a lot of hints about what the various options are to build Apache Hadoop, it is overwhelming to turn those directions into something that can actually be deployed. In fact, even many regular contributors to the project don’t even know how the Apache Software Foundation builds a release!
Inside the dev-support/bin directory, Apache Hadoop features a helper utility called
create-release that removes the guesswork. Built to help project release managers simplify their tasks, you can take advantage of it to make your release process easier, too.
By default, just running
create-release from any directory in the source repository will do a few things:
- Verify the repository passes ASF license requirements
- Build the Java components without any native (read: C/C++ compiled) parts
- Build the website and all the documentation, including generating the release notes from the ASF JIRA system
- Provide MD5 checksums to verify that transfers match
After finishing (which will take a while!), the built artifacts will be in the
$ ls -1 hadoop/target/artifacts
The artifacts should work everywhere Java works.
create-release, however, can do fancier builds with more features and some optimizations. Using the
--help flag, we see the following:
$ ./create-release --help
--artifactsdir=[path] Path to use to store release bits
--asfrelease Make an ASF release
--docker Use Hadoop's Dockerfile for guaranteed environment
--dockercache Use a Docker-private maven cache
--logdir=[path] Path to store logs
--mvncache=[path] Path to the maven cache to use
--native Also build the native components
--rc-label=[label] Add this label to the builds
--sign Use .gnupg dir to sign the artifacts and jars
--version=[version] Use an alternative version string
If something goes wrong, check the logs stored in the
$ ls -1 patchprocess/*log
Building the Native Components
Adding the OS-specific code to your release is usually the first priority. There are several components, but the two big ones are
libhadoop.dylib on Mac OS X) and the
container-executor. The former is a JNI library that enables a lot of extra functionality as well as faster versions of some features that are also implemented in Java. The latter enables the
LinuxContainerExecutor functionality that, despite the name, actually works on most Unix operating systems to provide significantly better security.
For a successful compile of the native components, you’ll need to make sure your build environment has all the necessary prerequisites. (See BUILDING.txt for more information). If you are building on Linux, however, see more information later on about using Docker to make that easier.
Changing Locations of Things
You can tell
create-release to use different directories for certain operations. The
--artifactsdir options are fairly self-explanatory. But what is the
Apache Hadoop uses maven as its build tool. Maven downloads Java dependencies as it is compiling. When it does this, it stores them into a local cache. This local cache, however, has a huge gotcha: there is no locking. This means that multiple maven executions may collide due to the sharing of this cache. The
--mvncache option allows you to tell
create-release to use a different directory for its cache. This makes the tool safe to use for concurrent maven runs.
Signing Your Build
If gpg and gpg-agent are available, the
--sign option will also sign the jars and artifacts. This is especially good if you follow-up the build with a
mvn deploy to upload your build into something like Artifactory, Nexus, etc.
Out of the box, Apache Hadoop typically encodes the version as x.y.z-SNAPSHOT, where x.y.z is the release currently under development in that particular branch. Under many circumstances, that is not ideal for a variety of reasons. Passing the
--version flag will allow you to override that string with something else. That version string is also compiled into the source such that
hadoop version will report it. The
--rc-label changes the names of the tarballs so that they also have an extension. Putting these together,
create-release --version=3.0.0-EM --rc-label=-RC1 will create Hadoop v3.0.0-EM stored in a tarball called hadoop-3.0.0-EM-RC1.tar.gz.
Getting all the pre-requisites for building Hadoop’s OS-native features is a time-consuming process. Luckily, a Dockerfile has shipped with Apache Hadoop for over a year now that is up-to-date with everything you need to build those features.
create-release can take advantage of that file by passing the
--docker option. This will run
create-release in a Docker container built from that Dockerfile. Any options that reference other directory paths, e.g.,
--logdir, will get mounted inside the container so that they work as expected.
With the help of Docker,
create-release can also guarantee a “fresh” cache. This is a useful exercise to guarantee that all dependencies are downloadable. The
--dockercache option forces using a fresh maven cache directory.
Apache Hadoop Release Management
One of the key reasons
create-release exists is to help ASF Apache Hadoop Release Managers offer a consistent build environment over time. This includes setting the proper flags so that
create-release is itself consistent. The
--asfrelease flag makes it so that ASF RM’s won’t have to research or memorize exactly how to make the release. It also includes some extra capabilities such as verifying that a signed release has a valid public key in the ASF master repositories. So while using the
--asfrelease flag might be tempting, (unless you are a committer for the Apache Hadoop project) this flag probably won’t work for you. 🙂
create-release is an easy way to build Apache Hadoop. It provides tooling to guarantee a consistent environment and build parameters. While built for ASF usage, you can also use it to get the most out of your local installation.