To many in the tech world, distributed tracing techniques are shrouded in mystery, as few have attempted to really understand the inner workings of Application Performance Management (APM) tools. The good news is that there’s no magic hiding behind the various tracing implementations, although it might seem that way at first. The reality is that there is a wealth of very good software available to either enable you to create your own distributed tracing or to completely automate the process for you.
Distributed Tracing The Hard Way: Open Source Tools
The open-source community has developed a few very capable distributed tracing tools over the past few years. The most popular of these open-source tools include Zipkin, and OpenTracing with multiple implementations, such as Jaeger, and OpenCensus.
The most important thing to remember when exploring open-source distributed tracing is that you need to add specific observability code to your custom business applications (meaning, manually add it into your source code) in order to achieve distributed tracing. For most, this comes with a steep learning curve and distracts from the primary objective of most businesses – generating more revenue by creating more/improved business functionality.
Here’s a quick summary of various open-source options.
From the Zipkin website ( https://zipkin.io) …
“Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data. Zipkin’s design is based on the Google Dapper paper.”
You can read about the many steps involved in creating distributed traces using Zipkin by visiting their Zipkin existing instrumentations pages.
Figure 1: A distributed trace shown in the Zipkin UI
From the OpenTracing documentation ( https://opentracing.io/docs/overview/what-is-tracing/) …
“It is probably easier to start with what OpenTracing is NOT.
- OpenTracing is not a download or a program. Distributed tracing requires that software developers add instrumentation to the code of an application, or to the frameworks used in the application.
- OpenTracing is not a standard. The Cloud Native Computing Foundation (CNCF) is not an official standards body. The OpenTracing API project is working towards creating more standardized APIs and instrumentation for distributed tracing.
OpenTracing is comprised of an API specification, frameworks and libraries that have implemented the specification, and documentation for the project. OpenTracing allows developers to add instrumentation to their application code using APIs that do not lock them into any one particular product or vendor.”
You can read about the many steps involved in creating distributed traces using OpenTracing by visiting their OpenTracing getting started web pages.
From the Jaeger documentation ( https://www.jaegertracing.io/docs/1.7/) …
“Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems, including:
- Distributed context propagation
- Distributed transaction monitoring
- Root cause analysis
- Service dependency analysis
- Performance / latency optimization
We published a blog post, Evolving Distributed Tracing at Uber, where we explain the history and reasons for the architectural choices made in Jaeger.”
You can read about the many steps involved in creating distributed traces using Jaeger by visiting their Jaeger supported client libraries pages.
From the OpenCensus website ( https://opencensus.io) …
“OpenCensus is a vendor-agnostic single distribution of libraries to provide metrics collection and tracing for your services.
OpenCensus originates from Google, where a set of libraries called Census were used to automatically capture traces and metrics from services. Since going open source, the project is now composed of a group of cloud providers, application performance management vendors, and open source contributors. The project is hosted on GitHub and all work occurs there.”
You can read about the many steps involved in creating distributed traces using OpenCensus by visiting their OpenCensus Tracing documentation pages.
Distributed Tracing the Easy Way: Instana AutoTrace
Put simply, there is a lot of work involved with implementing open-source distributed tracing. Even if you’re capable of doing that work, you should ask yourself if all of that time and effort might be better spent developing business functionality. What if there was a way to implement distributed tracing in your custom applications without any manual effort? It might sound too good to be true, but it already exists today.
At Instana, we’ve built a distributed tracing technology called AutoTrace, which is the basis for our Instana APM solution. As the name implies it’s completely automatic. This means that you will not have to change any of your code, you won’t need to configure any exporters or collectors, you won’t even have to manage any backend data collection servers.
With Instana AutoTrace all you have to do is install a base agent (process on a host, docker container, or Kubernetes DaemonSet) and Instana will automatically discover your running applications, creating the required observability. Every individual request is monitored and a distributed trace is created automatically for each of them by Instana AutoTrace. No other distributed tracing implementation on the planet can do the same. It’s not magic, it’s exceptionally well engineered software.
For now, Instana AutoTrace is 100% automatic for Java, PHP, and Python. AutoTrace is mostly automatic for .NET, .NET Core, Go, Ruby, Node.js, and Crystal. For these languages there is a small manual step of referencing the Instana tracing library and restarting the application. It’s a trivial amount of work for the value of the resulting distributed traces.
Other Benefits of Instana AutoTrace
The main purpose of creating observability and monitoring our applications is to identify the root cause of performance and stability problems. Distributed traces are part of the data required to accomplish this goal but the other key elements are:
- Distributed traces (already mentioned)
- Infrastructure metrics
- Middleware metrics
- Change events
- Dependency data
We’ve written another post that describes how all of these key elements are used to automatically determine the root cause of any problem.
When Does It Make Sense To Use Open Source Tracing?
The purpose of this post is not to bash open-source tracing but instead to inform about the various distributed tracing options that exist so that you can make an informed decision about their usage.
In my opinion, there are two main reasons to use open-source tracing to supplement the data provided by AutoTrace.
- When you want to measure application specific metrics that are not directly related to distributed service communications
- When you want to collect data or metrics beyond simple latency
Instana will automatically correlate all information from Instana AutoTrace with all manually provided metrics or tracing data from tools like Zipkin and Jaeger, transforming the disparate data feeds into a unified stream of information.
If you haven’t experienced Instana’s AutoTrace yet, I suggest you select the “Try Instana” button at the top of this page today. There’s no charge and it will provide you with the opportunity to see first hand how fast and easy it is to achieve distributed tracing in your own applications.