In the past few years, distributed tracing has emerged in the global DevOps consciousness as an indispensable tool in the microservices arsenal. In April 2019, the open source observability community rose to the challenge, uniting the energies that were previously divided between OpenTracing (a vendor-agnostic API to help developers instrument tracing into their code base) and OpenCensus (an open-source project that emits metrics and traces in application code) into a new project called OpenTelemetry. Roughly two years later, OpenTelemetry is gaining traction, evident by the AWS OpenTelemetry Distro and its growing adoption in the field.
In this blog post, we discuss what’s driving OpenTelemetry’s progress and where it’s heading.
Distributed tracing and the impact of incompatible formats
As users of Instana know all too well, the value of distributed tracing relies on the Network Effect: the more of your interoperating systems you can capture in the same trace, the more information you have about what, how, why, and who is affected when something goes wrong or something slows.
While distributed tracing as we know it has been around for more than a decade (at least in terms of Dapper, the progenitor of most distributed tracing systems in use today), there has been until recently very little interoperability between distributed tracing systems. For example, Jaeger and Zipkin, both very well-known and widely adopted, could not build together a trace until the implementation of W3C Trace Context, a standard adopted by most players in the observability and cloud providing space (including of course Instana). W3C Trace Context specifies a joint way of propagating trace context, meaning information about (1) which trace we are building and (2) at which point of the trace we are.
Currently, W3C Trace Context is specific to HTTP, and the efforts to come up with a similar specification for binary formats (messaging, MQTT and the like) have not yet borne fruit. Wouldn’t it be great to have one place for all the libraries to produce telemetry about your applications?
The OpenTelemetry project provides the specifications for implementations across multiple runtimes of SDKs that provide capabilities of collecting and reporting:
- Resource Data, to describe who or what is emitting
- Traces, to describe the interactions of the emittent
- Metrics, to describe the performance of the emittent
- Logs (currently in early stages of development)
In a nutshell, OpenTelemetry aims to be a one-stop-shop for the libraries you need to produce telemetry about your applications with an emphasis on embracing open standards and interoperability. For example, OpenTelemetry embraces the W3C Trace Context standard. Still, it allows users to “translate” distributed tracing data collected with OpenTelemetry into data for systems using Jaeger and Zipkin, for example.
One of the most exciting developments is the work ongoing with the big three cloud providers to offer distributed tracing data describing what their managed services are doing on your behalf, with particular emphasis in load balancers. A user can answer the age-old question, “Does the extra latency that hurts my end users come from the stuff I control?”
At Instana we also nod in appreciation at the growing manual and automated instrumentation being worked within the OpenTelemetry project for some runtimes. While AutoTrace remains the gold standard of distributed tracing, we are thrilled that our focus on automation is being recognized and popularized beyond the Instana user base. We will continue to invest in AutoTrace, and one of those investments is to make sure that AutoTrace and OpenTelemetry work seamlessly with one another!
Why many APM vendors rely on OpenTelemetry
The majority of organizations committed to OpenTelemetry come from the observability industry. You might ask, if distributed tracing is so valuable, why collaborate in an open source project that enables your competitors as well? The answer is actually rather simple: creating distributed tracing instrumentation that works is hard and expensive. Even more so in the constant duplication and fragmentation of ecosystems (the count of HTTP clients for the Java Virtual Machine should make your eyebrows shoot way up). For vendors that were not that strong in distributed tracing to begin with, the chasm to fill is very, very wide.
As the collection of telemetry increasingly becomes available, the differentiating value moves more and more towards the contextualization of data, in terms of automated visualization and analysis. It will take years for it to be a true commodity in terms of ease of adoption and reliability, but the trend line is clear.
In this light, the acquisition of Instana by IBM creates a true powerhouse in the space. More on Instana AutoTrace and OpenTelemetry in our next blog.