Taking Control of Daemons in Apache Hadoop

One of the features in Apache Hadoop 3.x is the ability to replace how the shell scripts work without having to support a code fork. This feature makes adding support for site or OS specific features, such as resource controls around daemons, really easy.

To keep things simple, let’s add Linux cgexec support for non-secure daemons. Doing this work for other environments/commands (Linux numactl, FreeBSD jails, etc) should be some relatively minor edits to what is presented here. Secure daemons work similarly but have a bit more complexity for a variety of reasons. Maybe I’ll cover that in the future….

Let’s get started!

Code Location

To make things easy, let’s work with the NameNode. We know that to start the NameNode we use the hdfs --daemon start namenode command. Time to dig into the hdfs code. Nothing looks particularly interesting until the end:

if [[ "${HADOOP_SUBCMD_SUPPORTDAEMONIZATION}" = true ]]; then
  if [[ "${HADOOP_SUBCMD_SECURESERVICE}" = true ]]; then
    hadoop_secure_daemon_handler \
      "${HADOOP_DAEMON_MODE}" \
      "${HADOOP_SUBCMD}" \
      "${HADOOP_CLASSNAME}" \
      "${daemon_pidfile}" \
      "${daemon_outfile}" \
      "${priv_pidfile}" \
      "${priv_outfile}" \
      "${priv_errfile}" \
      "${HADOOP_SUBCMD_ARGS[@]}"
  else
    hadoop_daemon_handler \
      "${HADOOP_DAEMON_MODE}" \
      "${HADOOP_SUBCMD}" \
      "${HADOOP_CLASSNAME}" \
      "${daemon_pidfile}" \
      "${daemon_outfile}" \
      "${HADOOP_SUBCMD_ARGS[@]}"
  fi
  exit $?
else
  # shellcheck disable=SC2086
  hadoop_java_exec "${HADOOP_SUBCMD}" "${HADOOP_CLASSNAME}" "${HADOOP_SUBCMD_ARGS[@]}"
fi

if [[ "${HADOOP_SUBCMD_SUPPORTDAEMONIZATION}" = true ]]; then

if [[ "${HADOOP_SUBCMD_SECURESERVICE}" = true ]]; then

hadoop_secure_daemon_handler \

"${HADOOP_DAEMON_MODE}" \

"${HADOOP_SUBCMD}" \

"${HADOOP_CLASSNAME}" \

"${daemon_pidfile}" \

"${daemon_outfile}" \

"${priv_pidfile}" \

"${priv_outfile}" \

"${priv_errfile}" \

"${HADOOP_SUBCMD_ARGS[@]}"

else

hadoop_daemon_handler \

"${HADOOP_DAEMON_MODE}" \

"${HADOOP_SUBCMD}" \

"${HADOOP_CLASSNAME}" \

"${daemon_pidfile}" \

"${daemon_outfile}" \

"${HADOOP_SUBCMD_ARGS[@]}"

exit $?

else

# shellcheck disable=SC2086

hadoop_java_exec "${HADOOP_SUBCMD}" "${HADOOP_CLASSNAME}" "${HADOOP_SUBCMD_ARGS[@]}"

So there are two functions here to look at since we’re concentrating on daemons (i.e., HADOOP_SUBCMD_SUPPORTDAEMONIZATION will be set to true).
* hadoop_secure_daemon_handler
* hadoop_daemon_handler

We know from the Unix Shell Guide that these are found in Hadoop’s function library.

Looking at the function names and the accompanying documentation to confirm, we know that hadoop_secure_daemon_handler is for secure daemons, so we’ll ignore that one. That leaves us with hadoop_daemon_handler. Looking at that code, it does a bunch of setup stuff and then calls two other functions:

      if [[ "$daemonmode" = "default" ]]; then
        hadoop_start_daemon "${daemonname}" "${class}" "${daemon_pidfile}" "$@"
      else
        hadoop_start_daemon_wrapper "${daemonname}" \
        "${class}" "${daemon_pidfile}" "${daemon_outfile}" "$@"
fi

if [[ "$daemonmode" = "default" ]]; then

hadoop_start_daemon "${daemonname}" "${class}" "${daemon_pidfile}" "$@"

else

hadoop_start_daemon_wrapper "${daemonname}" \

"${class}" "${daemon_pidfile}" "${daemon_outfile}" "$@"

hadoop_start_daemon_wrapper is just as its name implies: it is just a wrapper around hadoop_start_daemon… which means the function we want to target is hadoop_start_daemon.

Analyzing `hadoop_start_daemon`

Here’s what the bundled function looks like, after stripping out the comments:

function hadoop_start_daemon
{
  local command=$1
  local class=$2
  local pidfile=$3
  shift 3

  hadoop_debug "Final CLASSPATH: ${CLASSPATH}"
  hadoop_debug "Final HADOOP_OPTS: ${HADOOP_OPTS}"
  hadoop_debug "Final JAVA_HOME: ${JAVA_HOME}"
  hadoop_debug "java: ${JAVA}"
  hadoop_debug "Class name: ${class}"
  hadoop_debug "Command line options: $*"

  echo $$ > "${pidfile}" 2>/dev/null
  if [[ $? -gt 0 ]]; then
    hadoop_error "ERROR:  Cannot write ${command} pid ${pidfile}."
  fi

  export CLASSPATH
  exec "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"
}

function hadoop_start_daemon

{

local command=$1

local class=$2

local pidfile=$3

shift 3

hadoop_debug "Final CLASSPATH: ${CLASSPATH}"

hadoop_debug "Final HADOOP_OPTS: ${HADOOP_OPTS}"

hadoop_debug "Final JAVA_HOME: ${JAVA_HOME}"

hadoop_debug "java: ${JAVA}"

hadoop_debug "Class name: ${class}"

hadoop_debug "Command line options: $*"

echo $$ > "${pidfile}" 2>/dev/null

if [[ $? -gt 0 ]]; then

hadoop_error "ERROR: Cannot write ${command} pid ${pidfile}."

export CLASSPATH

exec "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"

}

We have a bunch of debug statements so that using the --debug flag prints useful information before daemon launch. We’ve got the writing of the pid file, since doing that in Java is painful. The CLASSPATH environment variable is exported so that the JVM will know where to find stuff. Finally, we have the exec of Java itself. It’s that last line where we’re going to want to change.

Preliminary Work: Getting Ready to Replace

Let’s copy this function without changes to make sure we can replace it. Create a file in HADOOP_CONF_DIR called hadoop-user-functions.sh. Give it permissions 0755. Inside, we need to put the proper bang path incantation. Let’s also create a fake hadoop_start_daemon to verify we can replace:

#!/usr/bin/env bash
#
#

function hadoop_start_daemon
{
  echo "The power of the elephant compels you!"
  exit 1
}

#!/usr/bin/env bash

function hadoop_start_daemon

{

echo "The power of the elephant compels you!"

exit 1

}

Running hdfs namenode shows that it works!

$ hdfs namenode The power of the elephant compels you!

Hooray! Instead of firing off Java, it printed out our message. While that worked, it’s not very useful I suppose. Let’s get our hands dirty.

`cgexec` Setup

In order to use cgexec, we need to have a cgroup configured. Let’s configure a simple one that we can use for HDFS. One thing we can do is prevent those processes from swapping:

cgcreate -t hdfs:hdfs -a hdfs:hdfs -g memory:hdfs
echo 0 > /sys/fs/cgroup/memory/hdfs/memory.swappiness

1 2	cgcreate -t hdfs:hdfs -a hdfs:hdfs -g memory:hdfs echo 0 > /sys/fs/cgroup/memory/hdfs/memory.swappiness

Now that we have an hdfs cgroup, we have something to use later.

Temporary Replacement

OK, now that we know we can replace successfully and have a cgroup to use, let’s change the code in hadoop-user-functions.sh to match what ships with Hadoop since it is a good starting point for our changes. I’m going to strip out the comments to make this snippet smaller. You’ll want to keep them and add to them as we go along. Right?

#!/usr/bin/env bash
#
#

function hadoop_start_daemon
{
  local command=$1
  local class=$2
  local pidfile=$3
  shift 3

  hadoop_debug "Final CLASSPATH: ${CLASSPATH}"
  hadoop_debug "Final HADOOP_OPTS: ${HADOOP_OPTS}"
  hadoop_debug "Final JAVA_HOME: ${JAVA_HOME}"
  hadoop_debug "java: ${JAVA}"
  hadoop_debug "Class name: ${class}"
  hadoop_debug "Command line options: $*"

  echo $$ > "${pidfile}" 2>/dev/null
  if [[ $? -gt 0 ]]; then
    hadoop_error "ERROR:  Cannot write ${command} pid ${pidfile}."
  fi

  export CLASSPATH
  exec "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"
}

#!/usr/bin/env bash

function hadoop_start_daemon

{

local command=$1

local class=$2

local pidfile=$3

shift 3

hadoop_debug "Final CLASSPATH: ${CLASSPATH}"

hadoop_debug "Final HADOOP_OPTS: ${HADOOP_OPTS}"

hadoop_debug "Final JAVA_HOME: ${JAVA_HOME}"

hadoop_debug "java: ${JAVA}"

hadoop_debug "Class name: ${class}"

hadoop_debug "Command line options: $*"

echo $$ > "${pidfile}" 2>/dev/null

if [[ $? -gt 0 ]]; then

hadoop_error "ERROR: Cannot write ${command} pid ${pidfile}."

export CLASSPATH

exec "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"

}

For basic cgexec support, we need to replace that exec line. Let’s do something simple for now so that we know it works:

exec cgexec -g memory:hdfs "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"

1	exec cgexec -g memory:hdfs "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"

Running hdfs --daemon start namenode should fire up the NameNode but in our new cgroup. Let’s verify it. We can use jps to figure out the NameNode’s pid. Using that pid, we can then get the cgroup information from /proc.

$ jps -l | grep -i namenode
16351 org.apache.hadoop.hdfs.server.namenode.NameNode
$ cat /proc/16351/cgroup
12:hugetlb:/
11:net_prio:/
10:perf_event:/
9:net_cls:/
8:freezer:/
7:devices:/
6:memory:/hdfs
5:blkio:/
4:cpuacct:/
3:cpu:/
2:cpuset:/
1:name=systemd:/user/1000.user/7.session

$ jps -l | grep -i namenode

16351 org.apache.hadoop.hdfs.server.namenode.NameNode

$ cat /proc/16351/cgroup

12:hugetlb:/

11:net_prio:/

10:perf_event:/

9:net_cls:/

8:freezer:/

7:devices:/

6:memory:/hdfs

5:blkio:/

4:cpuacct:/

3:cpu:/

2:cpuset:/

1:name=systemd:/user/1000.user/7.session

Success! From here we can see that, yes, our NameNode started in the hdfs cgroup!

Real World Replacement

That’s great, but hard-coding the cgroup isn’t particularly interesting. Let’s make this configurable so that we can control cgexec per daemon. Going back to hadoop_start_daemon, we can see that one of the parameters it takes is the command. That’s incredibly useful because we can use it to target specific daemons.

First, let’s add a line that takes the command variable and builds up a new variable for us to use.

function hadoop_start_daemon
{
  local command=$1
  local class=$2
  local pidfile=$3
  shift 3

  local cgvar="HADOOP_${command}_CGEXEC_OPTS"

  hadoop_debug "Final CLASSPATH: ${CLASSPATH}"

function hadoop_start_daemon

{

local command=$1

local class=$2

local pidfile=$3

shift 3

local cgvar="HADOOP_${command}_CGEXEC_OPTS"

hadoop_debug "Final CLASSPATH: ${CLASSPATH}"

Now that we have a variable we can use, let’s see if it’s defined and if so, call cgexec with the parameters that are inside it:

    hadoop_error "ERROR:  Cannot write ${command} pid ${pidfile}."
  fi

  export CLASSPATH
  if [[ -n "${!cgvar}" ]]; then
    exec cgexec ${!cgvar} "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"
  else
    exec "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"
  fi
}

hadoop_error "ERROR: Cannot write ${command} pid ${pidfile}."

export CLASSPATH

if [[ -n "${!cgvar}" ]]; then

exec cgexec ${!cgvar} "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"

else

exec "${JAVA}" "-Dproc_${command}" ${HADOOP_OPTS} "${class}" "$@"

}

Some explanation might be required around line 5. The ${!cgvar} expression is an indirect reference. This syntax means cgvar has the variable we are going to use. Ultimately, we’re now checking if HADOOP_command_CGEXEC_OPTS is defined. If it is, then we’re going to call our exec cgexec version of the java command. If it isn’t defined, then we’ll call the regular exec java like normal.

On line 6, note that the ${!cgvar} isn’t quoted. This syntax allows for spaces to get expanded to separate parameters. You’ll note that HADOOP_OPTS is the same way.

Now let’s test this out. If we run the NameNode again, it shouldn’t be in a cgroup:

$ hdfs --daemon start namenode
$ jps -l | grep -i namenode
16817 org.apache.hadoop.hdfs.server.namenode.NameNode
$ cat /proc/16817/cgroup
12:hugetlb:/
11:net_prio:/
10:perf_event:/
9:net_cls:/
8:freezer:/
7:devices:/
6:memory:/
5:blkio:/
4:cpuacct:/
3:cpu:/
2:cpuset:/
1:name=systemd:/user/1000.user/7.session

$ hdfs --daemon start namenode

$ jps -l | grep -i namenode

16817 org.apache.hadoop.hdfs.server.namenode.NameNode

$ cat /proc/16817/cgroup

12:hugetlb:/

11:net_prio:/

10:perf_event:/

9:net_cls:/

8:freezer:/

7:devices:/

6:memory:/

5:blkio:/

4:cpuacct:/

3:cpu:/

2:cpuset:/

1:name=systemd:/user/1000.user/7.session

Let’s put it back in our cgroup. In hadoop-env.sh, add this line:

HADOOP_namenode_CGEXEC_OPTS="-g memory:hdfs"

Now restart the namenode and see what happened:

$ hdfs --daemon stop namenode
$ hdfs --daemon start namenode
$ jps -l | grep -i namenode
16980 org.apache.hadoop.hdfs.server.namenode.NameNode
$ cat /proc/16980/cgroup
12:hugetlb:/
11:net_prio:/
10:perf_event:/
9:net_cls:/
8:freezer:/
7:devices:/
6:memory:/hdfs
5:blkio:/
4:cpuacct:/
3:cpu:/
2:cpuset:/
1:name=systemd:/user/1000.user/7.session

$ hdfs --daemon stop namenode

$ hdfs --daemon start namenode

$ jps -l | grep -i namenode

16980 org.apache.hadoop.hdfs.server.namenode.NameNode

$ cat /proc/16980/cgroup

12:hugetlb:/

11:net_prio:/

10:perf_event:/

9:net_cls:/

8:freezer:/

7:devices:/

6:memory:/hdfs

5:blkio:/

4:cpuacct:/

3:cpu:/

2:cpuset:/

1:name=systemd:/user/1000.user/7.session

Awesome! Since there aren’t other HADOOP_command_CGEXEC_OPTS variables defined, that means other daemons won’t be changed. We can verify this by starting up another daemon:

$ yarn --daemon start nodemanager
$ jps -l | grep -i nodemanager
17181 org.apache.hadoop.yarn.server.nodemanager.NodeManager
$ cat /proc/17181/cgroup
12:hugetlb:/
11:net_prio:/
10:perf_event:/
9:net_cls:/
8:freezer:/
7:devices:/
6:memory:/
5:blkio:/
4:cpuacct:/
3:cpu:/
2:cpuset:/
1:name=systemd:/user/1000.user/7.session

$ yarn --daemon start nodemanager

$ jps -l | grep -i nodemanager

17181 org.apache.hadoop.yarn.server.nodemanager.NodeManager

$ cat /proc/17181/cgroup

12:hugetlb:/

11:net_prio:/

10:perf_event:/

9:net_cls:/

8:freezer:/

7:devices:/

6:memory:/

5:blkio:/

4:cpuacct:/

3:cpu:/

2:cpuset:/

1:name=systemd:/user/1000.user/7.session

As we can see, the memory line on the cgroup is empty and not in HDFS. Adding an HADOOP_nodemanager_CGEXEC_OPTS with appropriate settings for YARN would work as expected: it would get run with cgexec with the contents of that variable as the parameters.

Conclusion

Using this functionality, it’s easy to see how this can be used to give a greater degree of control over how daemons in the Apache Hadoop environment work.

Taking Control of Daemons in Apache Hadoop

Code Location

Analyzing `hadoop_start_daemon`

Preliminary Work: Getting Ready to Replace

`cgexec` Setup

Temporary Replacement

Real World Replacement

Conclusion

Related

Leave a Reply Cancel reply

Code Location

Analyzing hadoop_start_daemon

Preliminary Work: Getting Ready to Replace

cgexec Setup

Temporary Replacement

Real World Replacement

Conclusion

Share this:

Related

Leave a Reply Cancel reply

Analyzing `hadoop_start_daemon`

`cgexec` Setup