A Day In the Life of a Developer at Instana
I was inspired by Marcel and others writing about life of SRE at Instana and how our SRE team uses Instana in their daily work to keep the Instana SAAS platform running smoothly. But Instana is not just a tool for Site Reliability Engineering. It’s also a tool for developers. With this blog post, I want to give you a glimpse of how Instana developers use Instana to continuously improve Instana. The focus will be how we use one of our newest features, AutoProfile™, to reduce our Cloud usage costs. AutoProfile is Instana’s always-on profiler designed to continuously monitor heavy workloads in production. And with up to a million tracing spans processed per second, Instana is itself a system processing heavy workloads.
In the dark days before cloud computing existed, optimising code was really about making use of the limited resources you had available to you. But in the cloud age, where you can have as many resources as you want, optimising code takes on a whole new meaning – reducing costs.
To power our Application Perspectives functionality, our backend continuously processes spans sent by the Instana agents out there. Most of the processing work is done in a component called appdata-processor. There are lots of appdata-processor instances running in our SaaS environments. These instances need lots of computational resources, which have considerable cost. So we are always looking for potential optimisations.
With the Instana AutoProfile capability this search for potential optimisation just got a lot easier. We simply open the Instana UI, navigate to “Analyze Profiles” and search for “appdata-processor”. My favourite visualisation of the profiling results is the flame graph.
The screenshot above shows a typical flame graph for one of our appdata-processor instances. It is easy to associate the different stacks to different tasks that this component is responsible for: ingestion & deserialization (1), service & application rules (2), extract span information (3), group spans into a trace (4) and serialization & downstream (5).
In this example, the applySpanPlugin stack caught my eye. We have different plugins for different types of spans, for example jdbc, http and grpc. I would expect the flame graph for the applySpanPlugin method to consist of a lot of different smaller stacks – one for every span plugin. But the flame graph shows that time is spent almost completely in a single plugin. Let’s dive deeper…
This appdata-processor spends 8% of its time parsing redis connection urls. This is a lot, even if the majority of the processed spans were redis spans (which isn’t the case, it’s around just 4% of all spans processed by Instana). Going further down the flame graph, we see that the time is actually spent in the classloader, where the JVM tries to load a handler for the redis protocol (and probably fails all the time).
We replaced the generic java.util.URL with an alternative that is optimized for redis connection URLs. After the rollout we looked again into Instana. Time spent in the applySpanPlugin method went down from 10,5% to 2,7%.
The RedisSpan plugin doesn’t even appear anymore with default settings. We have to lower the filter threshold from 1% to 0%. Time spent in the redis plugin went down from 8,1% to 0,1%. Now the flame graphs looks like we would expect it: time is spent in a lot of different plugins for all the different span types that are processed.
After Instana AutoProfile highlighted the bottleneck, the optimization took barely two hours to develop and roll out. And it allowed us to reduce CPU usage of our appdata-processor instances by 8%.
Instana AutoProfile helped us to easily identify potential cost optimization targets. There was no need for time-consuming load tests in isolated test environments. We could directly inspect the behaviour of our components under production load. Validating the impact of our optimizations after the rollout took just a few minutes.