Why You Should Monitor Percentiles

June 26, 2018

Why You Should Monitor Percentiles

What are percentiles and Global Percentiles?

A percentile is a measure indicating the value below which a given percentage of observations in a group of observations fall. For example, we capture 10 response-times of 10 requests hit an API with values in millisecond 10ms, 20ms, 30ms, 40ms, 50ms, 60ms, 70ms, 80ms, 90ms, 100ms. Percentile-90 in this example is 90ms. Because 90% of requests have response-time values less than or equal to 90ms.

In the context of monitoring a distributed system, a Global Percentile is a percentile calculated by aggregating all observations from multiple sources such as servers, containers. For example, to serve 10K requests to an API in this second, we have deployed 100 servers behind a load-balancer. To answer the question what is the response-time value that 90% of our customers are experiencing, we need to calculate the Global Percentile-90 based on 10K observations across all 100 servers.

Unlike the original StatsD’s locally pre-calculated percentiles, BeeInstant’s Global Percentiles are calculated in real-time globally across thousands of servers, containers, across multiple metric name dimensions and across different time intervals. This unique capability powers engineers to observe the true application performance and customer experiences, quickly spot issues and optimize services.

Why is Average misleading and Global Percentiles so important?

Take an example of a music streaming service, 10 customers requests their songs. We capture the metric song_loading_time of these 10 requests in milliseconds 10ms, 20ms, 25ms, 26ms, 28ms, 30ms, 31ms, 31ms, 31ms, 500ms. To get an idea of what is the loading time that customers are experiencing, we take the average of these 10 observations and get 73.2ms. Checking the 10 requests again, we see no request that has loading time near the average 73.2ms.

The average does not tell us the true customer experience, but the percentiles do. Percentile-90 tells us 90% of our customers are waiting less than or equal to 31ms to have their songs loaded. Percentiles allow us to understand the distribution. BeeInstant provides 9 Global Percentiles out of the box for every metric. Combining these 9 Global Percentiles, engineers can build reliable alarms, visualize the true application performance and customer experiences.

What are Cardinality and High Cardinality?

In mathematics, the cardinality of a set is a measure of the “number of elements of the set”. To explain cardinality in the context of metric monitoring, let’s reuse the example of the music streaming service above. When we monitor the metric song_loading_time, we also want to track this metric together with the locations/cities of customers, their devices, their requested song categories. To do this, we construct multi-dimensional metrics with dimensions City, Device, and SongCategory. The cardinality of the set of metrics constructed from these three dimensions is Number of Cities * Number of Devices * Number of SongCategory.

In this example, we track song_loading_time together with 4,000 cities, 10 different devices, and 200 song categories. The cardinality or the number of unique metrics is 4000 * 10 * 200 = 8 Million. Eight million can be considered as a high cardinality. BeeInstant is designed to handle this level of cardinalities.

Why does High Cardinality matter?

If tracking customer experience for each individual would be vital to a service/business, high cardinality capability empowers engineers to do so. Customer-id could be added as a dimension of metrics. The number of metrics should be driven by the need of monitoring not by the limitation of the monitoring system. Engineers know best what metrics are critical for their services. BeeInstant empowers engineers to achieve their monitoring needs at scale with no compromise.

Instana has a rock-solid automated application monitoring and observability platform built for for every engineer and stakeholder. Join us on this amazing adventure and discover what you have been missing. Sign up today for a free two-week trial. Looking forward to having you joined us.

Happy Monitoring!

Play with Instana’s APM Observability Sandbox

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit instana.com.