Monitoring Hadoop Applications Running on Amazon EMR

September 3, 2019


What is EMR?

According to Amazon, “Amazon EMR is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).”
Word Image 101

Amazon EMR is used by many customers across several verticals to handle big data use cases. These use cases include; machine learning, data transformations, financial and scientific simulation, bioinformatics, log analysis, and deep learning. EMR enables customers the ability to run their specific use cases on single-purpose short lived clusters that scale to meet demand.

Monitoring Amazon EMR

Amazon EMR provides several tools to gather information about the cluster including from the console, the CLI, and the Hadoop web interfaces and log files are available on the master node. Additionally, Amazon’s CloudWatch can be used.

With CloudWatch, metrics are gathered every five minutes for each EMR cluster. The metrics allow users to track the progress of a cluster, detect clusters, detect when a node runs out of storage, AppsCompleted, AppsFailed, AppsRunning, and more. While this can help provide information about how the environment is trending over time, it leaves a lot to be desired for customers running production applications and needing real-time visibility.

How to Monitor Amazon EMR with Instana

To effectively monitor Amazon EMR requires visibility at the cluster, node, and Hadoop application layers. Instana provides the most efficient way to discover and monitor Amazon EMR clusters, nodes, and Hadoop applications. To begin, install the Instana Agent on AWS and the Agent automatically discovers all EMR components running in the environment. Once discovered, the agent deploys all appropriate monitoring sensors and begins tracing and analyzing every request. Using a combination of machine learning and preset health rules, Instana automatically determines the health of the Hadoop applications and EMR components.

Word Image 102

Metrics – The Instana agent automatically identifies EMR running in AWS and, with no manual effort required, deploys and configures Instana’s EMR monitoring sensor. Instana references its curated knowledge base to understand what performance metrics are relevant to collect as well as what parameters must be configured. Instana enables users to determine the granularity of metics being pulled. Specifically, Instana’s automatic configuration for EMR is set to track things like Cluster Details (Id, Name, Creation time, version, etc.), Cluster metrics (Apps Running, Apps Pending, Memory Allocated, Memory Available, etc.), and Node metrics (Active Nodes, Lost Nodes, Unhealthy Nodes, etc.).

Word Image 103

Health – In addition to automatically collecting performance metrics, the Instana EMR monitoring sensor will also automatically collect KPIs on the monitored environment’s jobs to determine its health. These health signatures are used to raise Issues or Incidents depending on user impact.

With Instana, you’ll have a full analysis of every user impact, performed automatically, that correlates all of the data from the traces with the underlying EMR metrics. By doing so, Instana provides the root cause of any issue within a few seconds. This enables you to update your services as often as you need to without worrying if there are regressions impacting your customers.

Instana’s Amazon EMR monitoring includes automatic and continuous discovery, dependency mapping, metric monitoring, distributed tracing, anomaly detection, and analytics across the complete trace data set. This means you’ll always know everything that EMR is doing and the impact to user requests at all times. To see Instana’s EMR monitoring in action sign up for a free trial of Instana today.

Play with Instana’s APM Observability Sandbox

Announcement, Developer, Product
Monitoring AWS Fargate based applications doesn’t have to be difficult. Instana has brought our automated distributed tracing technology to Fargate. What is Fargate? AWS Fargate is a serverless computing platform for containers...
Announcement, Developer, Product, Thought Leadership
AWS Lambda, the serverless functions (or FaaS) offering from Amazon continues to grow in usage, both overall and in production applications. One of the biggest challenges is how to trace and monitor...
Engineering, Product
At Instana, we store a lot of customer telemetry data in various databases. A part of our production environment runs in Amazon Web Services (AWS). We use encrypted EBS volumes to securely...

Start your FREE TRIAL today!

As the leading provider of Automatic Application Performance Monitoring (APM) solutions for microservices, Instana has developed the automatic monitoring and AI-based analysis DevOps needs to manage the performance of modern applications. Instana is the only APM solution that automatically discovers, maps and visualizes microservice applications without continuous additional engineering. Customers using Instana achieve operational excellence and deliver better software faster. Visit to learn more.