Continuous Performance Profiling

March 10, 2017


The difference between a program’s exponential, linear, logarithmic and constant execution times is critical for various use cases. Even if an algorithm is purposely designed to satisfy a certain complexity class, there are multiple reasons why it might not. An underlying library, OS or even hardware can be the root cause of a performance problem.

Performance profiling has been a part of software development since the beginning. It is essential for optimizing and fixing a program’s time and space complexity as well as any bottlenecks caused by third-party dependencies. A performance issue without an execution profile is like an error without a stack trace. It will lead to a lot of manual work to get to the root cause.

Call graphs

The profiler’s output is usually structured as some sort of call graph depending on the type of profile. For the CPU profile it would be a call graph, usually in the form of a tree, consisting of stack frames of function calls as branches and the number of samples as values. Looking at such a profile will immediately reveal a hot spot, i.e. a function call, which was found (sampled) the most on the CPU. Similarly for memory allocations, such a profile will show how many bytes are allocated and not released by which function call.

Other types of sampling profilers will provide similar information about blocking calls, i.e. calls waiting for an event (e.g. mutex or even asynchronous calls).

A CPU call graph may look like this:

Profiling cloud applications

The era of horizontally scalable, data-intensive cloud applications deployed on FaaS, PaaS, IaaS or bare metal introduces an even greater need for profilers, since the performance of a single instance of an application running locally on a developer’s machine no longer correlates to a large-scale data center deployment. A different scale and use of production application, its data volume, traffic patterns or configuration will expose inefficiencies and issues in the code not detectable in a development or testing environment.

Traditional application performance management and monitoring products tried to address cloud applications by monitoring and tracing certain business specific workloads and introducing on-demand and automatic remote profiling capabilities.

Continuous vs. on-demand performance profiling

The problem is that on-demand or automatic profiling only allows post factum analysis. It might be helpful in case of performance regression or a problem-driven optimization, but it doesn’t provide the basis for continuous performance improvements. Assuming the application is evolving, rather than addressing its performance continuously will result in gradual performance regression.

Continuous profiling is not triggered by any event or human. The idea is that it is “always” active on a small subset of application processes. In terms of profiling overhead, this leads to even lower total overhead.

The most obvious benefits of such an approach are:

  • Constant access to various current and historical performance profiles for troubleshooting and optimization.
  • Ability to historically compare profiles and locate regression causes with line-of-code precision.
  • Locating infrastructure-wide hot code or libraries, fixing which would benefit all applications.
  • Availability of pre-crash profile history for post mortem analysis.
  • No risk of crashing the application by invoking an on-demand profiler against a suffering or failing application, which is ironically the main use case for on-demand profiling.

A perfect example of a large-scale continuous profiling system is Google’s Google-Wide Profiling (GWP), which profiles almost every server and application at Google. Please refer to the GWP paper for the full details.

In turn, Instana’s APM-integrated profiling enables continuous performance profiling for anyone – developers, small businesses or large enterprises. As of today, it supports Go, Node.js and Java applications with the ability to profile CPU usage, memory allocations, blocking and async calls, also providing contextual information such as errors and multiple runtime metrics. Learn more.

Historically comparable CPU profiles from an application over a selected period of time:

Continuous Performance Profiling from Instana

Play with Instana’s APM Observability Sandbox

Announcement, Product
What are Service Level Objectives (SLO) SLOs are important pieces that are used to define Service Level Agreements (SLAs). As described by Wikipedia, “SLOs are specific measurable characteristics of the SLA such...
Announcement, Featured, Product, Thought Leadership
When it comes right down to it, there are many different types of Docker monitoring but most people don’t realize how many layers of monitoring are required to understand the performance and...
Announcement, Featured, Product
IBM MQ (previously Websphere MQ and MQ Series) is an enterprise-oriented messaging middleware. According to the IBM website, “It works with a broad range of computing platforms, applications, web services and communications...

Start your FREE TRIAL today!

As the leading provider of Automatic Application Performance Monitoring (APM) solutions for microservices, Instana has developed the automatic monitoring and AI-based analysis DevOps needs to manage the performance of modern applications. Instana is the only APM solution that automatically discovers, maps and visualizes microservice applications without continuous additional engineering. Customers using Instana achieve operational excellence and deliver better software faster. Visit to learn more.