Why You Should Monitor Percentiles

June 26, 2018


What are percentiles and Global Percentiles?

A percentile is a measure indicating the value below which a given percentage of observations in a group of observations fall. For example, we capture 10 response-times of 10 requests hit an API with values in millisecond 10ms, 20ms, 30ms, 40ms, 50ms, 60ms, 70ms, 80ms, 90ms, 100ms. Percentile-90 in this example is 90ms. Because 90% of requests have response-time values less than or equal to 90ms.

In the context of monitoring a distributed system, a Global Percentile is a percentile calculated by aggregating all observations from multiple sources such as servers, containers. For example, to serve 10K requests to an API in this second, we have deployed 100 servers behind a load-balancer. To answer the question what is the response-time value that 90% of our customers are experiencing, we need to calculate the Global Percentile-90 based on 10K observations across all 100 servers.

Unlike the original StatsD’s locally pre-calculated percentiles, BeeInstant’s Global Percentiles are calculated in real-time globally across thousands of servers, containers, across multiple metric name dimensions and across different time intervals. This unique capability powers engineers to observe the true application performance and customer experiences, quickly spot issues and optimize services.

Why is Average misleading and Global Percentiles so important?

Take an example of a music streaming service, 10 customers requests their songs. We capture the metric song_loading_time of these 10 requests in milliseconds 10ms, 20ms, 25ms, 26ms, 28ms, 30ms, 31ms, 31ms, 31ms, 500ms. To get an idea of what is the loading time that customers are experiencing, we take the average of these 10 observations and get 73.2ms. Checking the 10 requests again, we see no request that has loading time near the average 73.2ms.

The average does not tell us the true customer experience, but the percentiles do. Percentile-90 tells us 90% of our customers are waiting less than or equal to 31ms to have their songs loaded. Percentiles allow us to understand the distribution. BeeInstant provides 9 Global Percentiles out of the box for every metric. Combining these 9 Global Percentiles, engineers can build reliable alarms, visualize the true application performance and customer experiences.

What are Cardinality and High Cardinality?

In mathematics, the cardinality of a set is a measure of the “number of elements of the set”. To explain cardinality in the context of metric monitoring, let’s reuse the example of the music streaming service above. When we monitor the metric song_loading_time, we also want to track this metric together with the locations/cities of customers, their devices, their requested song categories. To do this, we construct multi-dimensional metrics with dimensions City, Device, and SongCategory. The cardinality of the set of metrics constructed from these three dimensions is Number of Cities * Number of Devices * Number of SongCategory.

In this example, we track song_loading_time together with 4,000 cities, 10 different devices, and 200 song categories. The cardinality or the number of unique metrics is 4000 * 10 * 200 = 8 Million. Eight million can be considered as a high cardinality. BeeInstant is designed to handle this level of cardinalities.

Why does High Cardinality matter?

If tracking customer experience for each individual would be vital to a service/business, high cardinality capability empowers engineers to do so. Customer-id could be added as a dimension of metrics. The number of metrics should be driven by the need of monitoring not by the limitation of the monitoring system. Engineers know best what metrics are critical for their services. BeeInstant empowers engineers to achieve their monitoring needs at scale with no compromise.

Instana has a rock-solid automated application monitoring and observability platform built for for every engineer and stakeholder. Join us on this amazing adventure and discover what you have been missing. Sign up today for a free two-week trial. Looking forward to having you joined us.

Happy Monitoring!

Play with Instana’s APM Observability Sandbox

Start your FREE TRIAL today!

As the leading provider of Automatic Application Performance Monitoring (APM) solutions for microservices, Instana has developed the automatic monitoring and AI-based analysis DevOps needs to manage the performance of modern applications. Instana is the only APM solution that automatically discovers, maps and visualizes microservice applications without continuous additional engineering. Customers using Instana achieve operational excellence and deliver better software faster. Visit https://www.instana.com to learn more.