Taking Control of Daemons in Apache Hadoop

By | 2016-08-10

One of the features in Apache Hadoop 3.x is the ability to replace how the shell scripts work without having to support a code fork. This feature makes adding support for site or OS specific features, such as resource controls around daemons, really easy.

To keep things simple, let’s add Linux cgexec support for non-secure daemons. Doing this work for other environments/commands (Linux numactl, FreeBSD jails, etc) should be some relatively minor edits to what is presented here. Secure daemons work similarly but have a bit more complexity for a variety of reasons. Maybe I’ll cover that in the future….

Let’s get started!

Code Location

To make things easy, let’s work with the NameNode. We know that to start the NameNode we use the hdfs --daemon start namenode command. Time to dig into the hdfs code. Nothing looks particularly interesting until the end:

So there are two functions here to look at since we’re concentrating on daemons (i.e., HADOOP_SUBCMD_SUPPORTDAEMONIZATION will be set to true).
* hadoop_secure_daemon_handler
* hadoop_daemon_handler

We know from the Unix Shell Guide that these are found in Hadoop’s function library.

Looking at the function names and the accompanying documentation to confirm, we know that hadoop_secure_daemon_handler is for secure daemons, so we’ll ignore that one. That leaves us with hadoop_daemon_handler. Looking at that code, it does a bunch of setup stuff and then calls two other functions:

hadoop_start_daemon_wrapper is just as its name implies: it is just a wrapper around hadoop_start_daemon… which means the function we want to target is hadoop_start_daemon.

Analyzing hadoop_start_daemon

Here’s what the bundled function looks like, after stripping out the comments:

We have a bunch of debug statements so that using the --debug flag prints useful information before daemon launch. We’ve got the writing of the pid file, since doing that in Java is painful. The CLASSPATH environment variable is exported so that the JVM will know where to find stuff. Finally, we have the exec of Java itself. It’s that last line where we’re going to want to change.

Preliminary Work: Getting Ready to Replace

Let’s copy this function without changes to make sure we can replace it. Create a file in HADOOP_CONF_DIR called hadoop-user-functions.sh. Give it permissions 0755. Inside, we need to put the proper bang path incantation. Let’s also create a fake hadoop_start_daemon to verify we can replace:

Running hdfs namenode shows that it works!


$ hdfs namenode
The power of the elephant compels you!

Hooray! Instead of firing off Java, it printed out our message. While that worked, it’s not very useful I suppose. Let’s get our hands dirty.

cgexec Setup

In order to use cgexec, we need to have a cgroup configured. Let’s configure a simple one that we can use for HDFS. One thing we can do is prevent those processes from swapping:

Now that we have an hdfs cgroup, we have something to use later.

Temporary Replacement

OK, now that we know we can replace successfully and have a cgroup to use, let’s change the code in hadoop-user-functions.sh to match what ships with Hadoop since it is a good starting point for our changes. I’m going to strip out the comments to make this snippet smaller. You’ll want to keep them and add to them as we go along. Right?

For basic cgexec support, we need to replace that exec line. Let’s do something simple for now so that we know it works:

Running hdfs --daemon start namenode should fire up the NameNode but in our new cgroup. Let’s verify it. We can use jps to figure out the NameNode’s pid. Using that pid, we can then get the cgroup information from /proc.

Success! From here we can see that, yes, our NameNode started in the hdfs cgroup!

Real World Replacement

That’s great, but hard-coding the cgroup isn’t particularly interesting. Let’s make this configurable so that we can control cgexec per daemon. Going back to hadoop_start_daemon, we can see that one of the parameters it takes is the command. That’s incredibly useful because we can use it to target specific daemons.

First, let’s add a line that takes the command variable and builds up a new variable for us to use.

Now that we have a variable we can use, let’s see if it’s defined and if so, call cgexec with the parameters that are inside it:

Some explanation might be required around line 5. The ${!cgvar} expression is an indirect reference. This syntax means cgvar has the variable we are going to use. Ultimately, we’re now checking if HADOOP_command_CGEXEC_OPTS is defined. If it is, then we’re going to call our exec cgexec version of the java command. If it isn’t defined, then we’ll call the regular exec java like normal.

On line 6, note that the ${!cgvar} isn’t quoted. This syntax allows for spaces to get expanded to separate parameters. You’ll note that HADOOP_OPTS is the same way.

Now let’s test this out. If we run the NameNode again, it shouldn’t be in a cgroup:

Let’s put it back in our cgroup. In hadoop-env.sh, add this line:

HADOOP_namenode_CGEXEC_OPTS="-g memory:hdfs"

Now restart the namenode and see what happened:

Awesome! Since there aren’t other HADOOP_command_CGEXEC_OPTS variables defined, that means other daemons won’t be changed. We can verify this by starting up another daemon:

As we can see, the memory line on the cgroup is empty and not in HDFS. Adding an HADOOP_nodemanager_CGEXEC_OPTS with appropriate settings for YARN would work as expected: it would get run with cgexec with the contents of that variable as the parameters.

Conclusion

Using this functionality, it’s easy to see how this can be used to give a greater degree of control over how daemons in the Apache Hadoop environment work.

Leave a Reply