Apache Spark

High Performance Spark: Best Practices For Scaling and Optimizing Apache Spark

Apache Spark Monitoring and Performance Management

Apache Spark is the largest open source data processing project, providing a fast data processing tool for big data and deep analytics. Instana’s Apache Spark Monitoring includes the ability to monitor Spark deployed through AWS EMR, but can also monitor Spark Standalone Cluster Manager. Spark performance monitoring revolves around monitoring the Spark Driver instance. Instana’s Spark Monitoring Sensor supports both Driver deployment methods.

Spark Performance and Health Monitoring

Depending on the type of application that has been deployed (EMR, Standalone), different data is collected and used for monitoring.

Spark Performance and Configuration Monitoring

For Spark instances running on AWS EMR, install the Instana agent on the Amazon EC2 instances withing the EMR cluster. If you want automated deployment of the Spark monitoring sensor, the Instana agent must be placed on all nodes in the EMR cluster.

Instana’s Spark Monitoring includes an automatically built summary dashboard that centers around application KPIs – including response time and load. The dashboard also includes key infrastructure configuration and performance metrics, as well as specific Spark processing data metrics. The dashboard allows DevOps and IT Ops to see all relevant Spark data on one screen, making it easy to understand the state of their Spark instances.

Monitoring the health and performance of Apache Spark instances requires both an understanding of Spark, itself, as well as the ability to see the interactions and dependencies between clustered spark instances and the interactions with other microservices (both upstream and downstream). Instana’s Spark monitoring sensor automatically identifies and collects those relevant metrics.

Spark Monitoring Data

Batch Applications Streaming Applications Configuration Metrics
Jobs Batching Host Alive Workers
Stages Scheduling Delay Port Dead Workers
Longest Completed Steps Total Delay Rest URI Decommissioned Workers
Executors Processing Time Version Workers in Unknown State
Output Operations Status Used Memory
Input Records Total Memory
Receivers Used Cores
Executors Total Cores
Data and Metrics per Worker
Most Recent Apps
Most Recent Drivers

 

Spark Monitoring Sensor Installation: Getting Started

Ready to start monitoring Spark? Begin by signing up for an Instana Trial or Account. Once you have an account, hit the Spark Management Documentation for details on how to configure different Spark driver and deployment types.

Play with Instana’s APM Observability Sandbox

Start your FREE TRIAL today!