Instana’s award-winning Application Monitoring solution has always included automatic distributed application tracing. The recently released Application Perspectives features deliver the performance summary of your cloud native applications by leveraging real-time distributed tracing. Behind the scenes, Instana generates each Application dashboard with the timing metrics, response codes, and metadata gathered from every individual transaction that flows through your environment.
this post examines why observing systems using distributed tracing is the only approach for gathering true performance analytics and how these analytics are represented on your personalized application dashboards. In addition, we’ll show you how easy it is to begin leveraging our trace analyzer and show off some of the new tooling along the way.
Observations in the Wild
Cloud Native applications have introduced new challenges in determining performance impacts and outages. This is because every interaction between your services and their dependencies can be impacted by influences such as network latency, external provider availability, and as I’m sure we all know, the moons gravitational pull (just kidding).
In our other blog series on Application Perspectives, we’ve shown how the dashboards are a great tool when observing the performance of your applications and services at a glance. Each dashboard gives you a summary of the slow, noisy, and erroneous services and endpoints but that doesn’t help us understand why these issues are happening. We’ve captured all the data needed to troubleshoot and made it easy for you to analyze that data as well.
An ode to Yak Shaving: Troubleshoot the 99th percentile
Let’s take a look some dashboards for services which are performing quite well, though we’ve observed some statistical outliers that have warranted investigation! Now, these anomalies aren’t enough for Instana to recognize it as an issue or incident which is why we’ve alluded to a term some of you may not be familiar with: yak shaving.
In the dashboard below is what appears to be a user provider service. You can quickly see that in the “Top Endpoints” component our `GET /api/user/current` endpoint is happily serving requests at an average response time of 37ms. You’ll also notice that the latency graph is reporting that 99% of the responses from this service are within the mean (or average) but we have a very small percentage of requests that are taking upwards of 500ms to respond.
Analyzing the potentially lag inducing endpoint is as simple as clicking on the “Analyze Calls” button on the upper right hand side, sorting by latency, and digging into the calls which matter – in this case, the slow ones (fig. a).
After some quick analysis, we can see that this service is talking to an external authentication provider, and then making a call to Redis to cache the token. In this call, we are observing uncached authentication queries making an external vendor request (fig. b).
When Network Timing is Everything
When observing distributed applications, networking and external service dependencies can become a major contributor to performance degradation. Let’s take a look at another service where network performance is having an impact. When we analyze this specific span in this trace, you’ll see that almost 96% of the time waiting for this service to respond was network latency. (fig. c)
This can be determined because Instana is aware of the execution time for that request since we are monitoring and instrumenting every service in the application with tracing. Instana knows when the request was made, when it was returned, and the time the dependency actually took to process and send the request.
The data gathered from distributed traces has given you the most accurate depiction of the behavior resulting from your services communicating in real time. Instana will help you better understand your application, it’s interactions with both internal and external services, and the implications of performance bottlenecks.
Simplifying Distributed Traces
Prior to Application Perspectives, Instana provided trace analysis in a manner consistent with other tracing implementations such as OpenTracing and Jaeger. The feedback we heard from our customers was some of the terminology along with the presentation of these details were a bit long winded.
Even though they might be considered complex or long-winded, the technical concerns of traces and spans are important. These concerns are how many tracing implementations, including the Instana tracer, collect the necessary context for analyzing and stitching together the dependencies in a distributed cloud native application. Telemetry such as timing of the network and the service itself is derived from these spans.
fig. 4a: Distributed Trace break down
- An incoming request is intercepted by our trace module, the execution of the request itself is timed, details about the request and execution are captured.
- An outbound request is intercepted, details about the request and timings on the response are captured.
- Downstream the same thing happens again, in some cases many hundreds of times.
Instana still captures and uses this telemetry data to analyze performance but we’ve bundled some of the technical details together into a “call”.
fig 4b: Calls are the logical grouping of spans in a request.
Each individual call encapsulates the technical concerns of each span associated with a request. (fig 4b) It is assigned protocol-specific type, tags, and a duration which makes the querying and analysis of these entities even more concise and consumable. You can always learn more about our distributed tracing engine in our documentation.
The Pièce de Résistance
Distributed tracing is the Pièce de Résistance of the new Application Perspectives in Instana. While distributed tracing is not new to Instana, we’ve given developers the ability to analyze traces in an easy to use interface, summary dashboards to help them identify issues and regressions, and made those dashboards consumable by the entire application delivery organization. We believe that Instana has the best monitoring platform in the market for analyzing and troubleshooting the performance of cloud native applications. Start analyzing the performance of your application through your own unique perspective today.