Traditionally, load test tools have done a great job of identifying problems that need to be fixed but unfortunately, have NOT provided the details required to immediately know the root cause of the underlying problems so you can fix them. By using Instana’s Application Perspectives in conjunction with your load testing tool of choice, your developers now have all the information required to immediately fix any problems identified during load testing.
In this two part blog series, we illustrate the value created by using the load testing tool of our partner OctoPerf, and Instana’s APM solution. Both of our tools focus on ease of use and high value output, so this joint project showcases how fast and easy it is to get actionable insights from your application load tests.
Using a jointly built integration, you can separate load test data from production data and perform analysis based on the traffic generated solely by the load test of OctoPerf.
After you’ve set-up your Application Perspective as described in the companion post, and configured and run your load test, you are ready to dive deeper into the root cause analysis of the observed behavior and problems on the backend.
Analyzing A Load Test Using Instana
The application perspective provides you with an overview dashboard that gives you a useful first impression of the load test scenario.
One can see that we performed 3 load tests labeled A, B, and C. Inspect the number of calls of the three load test scenarios and compare with the average Error Rate. Notice the large spike in errors when the load test hit the limit of our application setup (Figure 2).
Let’s take a step back and look at the OctoPerf data along with the incidents that were generated on the Instana side (Figure 3). In the early stage of the load test, Instana generated an incident that the system load is high – this is an early indicator of the issues we will encounter at a later stage. The ramp up phase is directly visible in the Application perspective as well, including the point were we hit the limit of the system. The generated incident directly shows us that the System Memory is Exhausted, and that may lead to killing of processes by the OS.
The next step is to look into the incidents within Instana. In Figure 4 we clearly see the moment were the memory usage is hitting nearly 100%. That’s an interesting insight, and the explanation for why the error rate and the latency went up.
Further, we take a closer look at the information from the Instana infrastructure view of host metrics (Figure 5).
On the left side we see that there are 11 Containers running on that single host, and that the machine has only 2GB of memory. In the “Memory Used” chart we can see that at the beginning of the test 80% of the RAM was already in use. During the load test we put even more pressure on the system so it hits the limits of the provisioned host.
When the memory is completely consumed, the system starts trying to deal with the situation. The CPU consumption rises (Figure 6) and the context switching (Figure 7) becomes more and more of an issue, until it completely gets out of control. At this point it is pretty clear what the root cause for the behavior of the Application is – there are not sufficient hardware resources to handle the load. This makes sense keeping in mind that our huge technology stack is running on a small machine.
Let’s utilize the application perspective to analyze the traces data so that we can get deeper insights into the code execution. In Figure 8 we click and drag the mouse over the desired time range in the chart, then click the magnifying glass icon to set our time range.
Figure 9 shows the resulting Dashboard with the summary information for the selected time range. We see that the load test created 22k inbound calls and an overall mean latency of 193 ms. In this load test analysis we are looking for slow traces or errors, and we quickly identify them using the summary view of the application perspective.
Let’s explore “GET /api” by selecting it from the “Top Traces” list to see details about those requests. With a single click on the link, we slice through the tracing data to get only the relevant traces. We can derive from the overview (Figure 10), which service is involved and the latency for each of the erroneous calls.
In the trace detail view we can now clearly see what happened when we compare a healthy trace with an erroneous one (Figures 11 and 12). Due to lack of memory, the system killed the database that is contacted by the rating service, leading to a timeout in the JDBC driver that tries to contact the database. By using Instana’s infrastructure and application monitoring capabilities, we get a very clear picture of what happened on our system during the load test.
Using OctoPerf and Instana together enables detailed analysis of load test results. Immediately identify where performance bottlenecks are, the scalability limits of a system, and what needs to be corrected to improve the experience for your customers. Sign up for your free trials of Instana and OctoPerf today.