Monitoring Hadoop Clusters: a one hour project!

What if you could monitor your Hadoop Cluster out of the box? APM companies have attempted this in varying degrees for some time now, but none have ever crossed the Rubicon. I’ve personally tried to setup Hadoop monitoring with past solutions many times, but every time, it came up short. It was not for lack of trying, but here are many different reasons it never worked:

  • Hadoop Implementations are all different;
  • There are many processes to monitor, each requiring configuration;
  • It’s difficult to choose which Extensions to install and configure.

Where do you begin?

It was with trepidation, and a little pride, that I set out over the weekend to see if I could set up viable Hadoop monitoring with Instana.

In the Hortonworks Sandbox there are 21 relevant processes. Yarn, Spark, Tomcat, 15 other Java processes, MySQL, Postgres, and ZooKeeper. I have not even mentioned the dynamic processes that start and stop during data processing.

Installation of the Instana Agent was a snap using the One Liner installation from the documentation:

curl -o setup_agent.sh https://setup.instana.io/agent && chmod 700 ./setup_agent.sh && ./setup_agent.sh -a $yourAgentKey -l $location

As I had hoped, the technology stack and services just showed up—and the appropriate technology sensors started doing their job automatically: continuously monitoring changes and relationships, collecting data, and determining health.  Don’t forget there is an actual infrastructure underneath Hadoop.  For this project the following sensors were relevant: Hadoop Yarn, ZooKeeper, MySQL, PostgreSQL, Tomcat, Java, Process, Host.

The following are interesting screen shots from the deployment:

Yarn Tracing and Mapping (with zero configuration!).


All the Hortonworks processes are discovered.


The Yarn Dashboard.

Having successfully installed the Agent into the Hortonworks Sandbox, and feeling emboldened, I decided to try the Cloudera Sandbox. I had similar success. More Java processes were discovered in Cloudera, but essentially the same level of discovery and monitoring occurred.
The best part (other than that I finally saw something monitor Hadoop) was that I managed to complete all of this in under two hours on a rainy Saturday afternoon. That includes writing this article and taking all these fancy screenshots.
What can Instana do in your environment? Why not take a little time and find out.

Below are some Cloudera Screenshots that hint at Instana’s capabilities: