Monitoring Hadoop Applications Running on Amazon EMR

September 3, 2019

Post

What is EMR?

According to Amazon, “Amazon EMR is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).”
Word Image 101

Amazon EMR is used by many customers across several verticals to handle big data use cases. These use cases include; machine learning, data transformations, financial and scientific simulation, bioinformatics, log analysis, and deep learning. EMR enables customers the ability to run their specific use cases on single-purpose short lived clusters that scale to meet demand.

Monitoring Amazon EMR

Amazon EMR provides several tools to gather information about the cluster including from the console, the CLI, and the Hadoop web interfaces and log files are available on the master node. Additionally, Amazon’s CloudWatch can be used.

With CloudWatch, metrics are gathered every five minutes for each EMR cluster. The metrics allow users to track the progress of a cluster, detect clusters, detect when a node runs out of storage, AppsCompleted, AppsFailed, AppsRunning, and more. While this can help provide information about how the environment is trending over time, it leaves a lot to be desired for customers running production applications and needing real-time visibility.

How to Monitor Amazon EMR with Instana

To effectively monitor Amazon EMR requires visibility at the cluster, node, and Hadoop application layers. Instana provides the most efficient way to discover and monitor Amazon EMR clusters, nodes, and Hadoop applications. To begin, install the Instana Agent on AWS and the Agent automatically discovers all EMR components running in the environment. Once discovered, the agent deploys all appropriate monitoring sensors and begins tracing and analyzing every request. Using a combination of machine learning and preset health rules, Instana automatically determines the health of the Hadoop applications and EMR components.

Word Image 102

Metrics – The Instana agent automatically identifies EMR running in AWS and, with no manual effort required, deploys and configures Instana’s EMR monitoring sensor. Instana references its curated knowledge base to understand what performance metrics are relevant to collect as well as what parameters must be configured. Instana enables users to determine the granularity of metics being pulled. Specifically, Instana’s automatic configuration for EMR is set to track things like Cluster Details (Id, Name, Creation time, version, etc.), Cluster metrics (Apps Running, Apps Pending, Memory Allocated, Memory Available, etc.), and Node metrics (Active Nodes, Lost Nodes, Unhealthy Nodes, etc.).

Word Image 103

Health – In addition to automatically collecting performance metrics, the Instana EMR monitoring sensor will also automatically collect KPIs on the monitored environment’s jobs to determine its health. These health signatures are used to raise Issues or Incidents depending on user impact.

With Instana, you’ll have a full analysis of every user impact, performed automatically, that correlates all of the data from the traces with the underlying EMR metrics. By doing so, Instana provides the root cause of any issue within a few seconds. This enables you to update your services as often as you need to without worrying if there are regressions impacting your customers.

Instana’s Amazon EMR monitoring includes automatic and continuous discovery, dependency mapping, metric monitoring, distributed tracing, anomaly detection, and analytics across the complete trace data set. This means you’ll always know everything that EMR is doing and the impact to user requests at all times. To see Instana’s EMR monitoring in action sign up for a free trial of Instana today.

Play with Instana’s APM Observability Sandbox

Announcement, Conceptual, Developer, Engineering, Product
According to AWS: “[Graviton2 is] custom built by Amazon Web Services using 64-bit Arm Neoverse cores to deliver the best price performance for your cloud workloads running in Amazon EC2” At Instana...
|
Announcement, Developer, Product, Thought Leadership
AWS Lambda supports development and deployment of functions in a variety of programming languages including Node.js, Go, Java, and Python. Instana has previously announced automated tracing of AWS Lambda Functions for Node.js and...
|
Announcement, Developer, Product
Co-Authored by: Evgeni Wachnowezki AWS is sharing its Amazon EKS Distro Kubernetes distribution with the community. Amazon EKS Distro is a Kubernetes distribution optimized for security and reliability, and is battle-tested by...
|

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit instana.com.