Using Apache Yetus with Jenkins and Github: Part 1

Introduction Apache Yetus is a toolbox for building and releasing software.  One of the tools is a generalized framework for performing patch and full build testing for continuous integration systems. It supports a wide variety of build tools and features. As a result, it can be overwhelming to get started using it. In this blog post,… Read More »

Applying Patches, Smartly, Using Apache Yetus

Many open source projects use Bugzilla, JIRA, or some other workflow where contributions are submitted via a patch file attached to a bug report or even an email. This workflow means working with a raw patch file. Working with these files can be a bit of a pain as one tries to figure out what… Read More »

Fixing Apache Hadoop CVE-2016-6811: argv[0] vs. Security

Let’s discuss CVE-2016-6811 now that it has been published. Freddie Rice (in the midst of reporting another hole) discovered that Apache Hadoop suffered from a security anti-pattern: trusting argv[0]. The fundamental problem with argv[0] is that it’s possible for a caller to modify its contents. This situation means that argv[0] can contain anything and everything.… Read More »

Powerful _USERs in Apache Hadoop 3.0.0-alpha4

"Super Heroes" (CC BY-SA 2.0) by Olaf Gradin A lot of work has been done to greatly clarify and enhance various environment variables in the Apache Hadoop shell script code. One of those places was in the usage of various _USER environment variables. Prior to 3.0.0-alpha4 In previous releases, the supported variables were: Name Description HADOOP_SECURE_DN_USER User to… Read More »

Workaround: Secure DataNode Crashes

A recent Linux kernel update to workaround CVE-2017-1000366 is causing Apache Hadoop’s secure DataNode (and NFS manager) to crash on startup. (Related discussion from Red Hat and Ubuntu) If your systems are running a variant of Apache Hadoop 3.x, you can take advantage of user functions to workaround the Java Invocation API issue that causes… Read More »

Docker Security in Framework Managed, Multi-user Environments

A while back, Jessie Frazelle wrote and published an informative blog post on the differences between containers, zones, and jails. Since it touched on security, the blog post reminded me of a conversation that was had last year when a contributor to the Apache Yetus project asked about this blog post about one of the… Read More »

Unofficial History of the HDFS Audit Log

There’s been a lot of chatter in the Apache Hadoop universe lately over the role of the HDFS audit log. This includes a lot of supposition on the purpose of the original design. This is my recollection of the events that led to its creation. Unfortunately, humans do not come with ECC RAM so apologies… Read More »

Building Your Own Apache Hadoop Distribution

While BUILDING.txt includes a lot of hints about what the various options are to build Apache Hadoop, it is overwhelming to turn those directions into something that can actually be deployed. In fact, even many regular contributors to the project don’t even know how the Apache Software Foundation builds a release! Inside the dev-support/bin directory,… Read More »

Taking Control of Daemons in Apache Hadoop

One of the features in Apache Hadoop 3.x is the ability to replace how the shell scripts work without having to support a code fork. This feature makes adding support for site or OS specific features, such as resource controls around daemons, really easy. To keep things simple, let’s add Linux cgexec support for non-secure… Read More »

Adding to Apache Hadoop’s Classpath

One of the big pain points of administrating Apache Hadoop is the ability to safely and efficiently add to the classpath. The original design in Hadoop gave users a single way to add jars: the HADOOP_CLASSPATH environment variable. This is a bit of a problem for end users, admins, and any 3rd party applications may… Read More »