Introducing the Instana Trace SDK

Extending Instana’s ability to follow distributed calls across custom protocols

One of the leading design goals for Instana is automatic detection and zero configuration. We do not want to require our users to configure what and how they monitor. Therefore we’ve build an agent which installs and manages itself, and applies instrumentation automatically at well known places in application code.

At the moment, Instana supports monitoring application code written in NodeJS, PHP and any language running on the Java Virtual Machine, Java and Scala.

We are confident that our automatic instrumentation can successfully discover around 80% of the messaging and communication patterns of any application. The remaining percentages are custom protocols, which is not using any standard framework, or is using a framework that Instana does not yet understand. For the latter, we appreciate feedback, as we can add support for frameworks easily. For the former the new Instana Trace SDK is the solution to get the wanted visibility.

How an Automatic Discovered Application Would Look Like

Let us look at a simple example, which works completely through our auto discovery: An application written using the Dropwizard and Jersey frameworks which handles an HTTP request. Internally, an HTTP request to another application is made.


As you can see, Instana automatically detected the incoming user request to /helloWorld and started monitoring the request which took a total time of 1101ms to execute. Instana also detected the outgoing http call to http://remote/hello, which took a total of 73ms, 63ms were spent on making the network call (which includes time spent to serialize and deserialize data to the wire). And last, but not least, Instana detected the incoming call on the remote server.

Those three light blue elements are called a “span” in Instana, which is a term derived from Google’s Dapper paper. A span can usually be understood as “time span”, which literally indicates an action with its start and end times. Because the first Span on a service indicates that a call entered the service, it is called an “entry” (in the Dapper paper, this is named “server span”). Spans of calls leaving a service are called “exit” (in the Dapper paper, this is named a “client span”). The above example consists of two entry spans and one exit span. Being able to tell at an entry that it was called by a call known on the previous  – also “upstream” – service is called a “correlation”. To do that, Instana sends correlation headers automatically with instrumented exits, and those correlation headers are automatically read by Instana’s entries.

How a Non-Standard Remote Call Would Look Like

Now imagine that your custom application uses a proprietary, hand-written TCP call to the remote service, instead of the (nowadays quite standard) HTTP. The Instana agent does not know about it, so if a call would happen it would look like this:

Still the standard Dropwizard HTTP call was detected, but the outgoing call was not.

This is where the SDK comes into play. It informs Instana about the remote call and allows Instana to correlate it, which will result in a trace like this instead:

Compared with the first example, Instana was informed about three additional details:

  1. The exit span
    Just annotate the method which should become the exit span with @Span(type = Type.EXIT, value = "custom-tcp-client")
  2. The entry span
    Just annotate the method which should become an entry span with @Span(type = Type.ENTRY, value = "custom-tcp-server")
  3. The correlation headers
    This is slightly more work: You first need to get the current correlation headers inside the exit and add it to the custom protocol. In this example the custom protocol transports a map, so the required line of code is: SpanSupport.addTraceHeadersIfTracing(Type.EXIT, params)
    Next is to read the header on the server receiving side before the entry is created
    SpanSupport.inheritNext(traceId, spanId);

Marking Sections of a Trace

As explained above and in previous blogs, we do not touch custom code during instrumentation. The out of the box experience covers many cases, and custom entries and exits supplement them. But there are two more use cases that should be covered:

  1. Dividing long traces into semantical/business sections
  2. Collecting certain important/business data

For that we offer Spans of type Type.INTERMEDIATE. They can carry the same data and are visualized similarly to Entries and Exits, but they do not imply entering or leaving a system.

The next screenshot shows how two intermediate spans make the total runtime of /helloWorld easier to understand (1 second intermediate-nap and 4 seconds calling-tcp).

Impact of SDK Instrumentation

All automatically applied instrumentation has extremely low impact on the application itself.

Because we apply these instrumentations in safe places at the edge of application code, applications never see higher execution times.

The SDK instrumentation is not much different. It is turing complete, which means all of Instana’s automatic instrumentation could be manually built with the SDK. It is also almost as efficient as the built in automatic instrumentation, so it is safe to use it in all places where it makes sense.

Java Trace SDK Source Code and Full example

We published the full source code of the Java Trace SDK and the sample application to produce above examples, which shows exactly how to use the SDK on Github:

It is worth mentioning, that the SDK consists of only stubs and annotations. Meaning that it can be added to an application, and will only activate when the application is actually monitored. Until then the SDK will not have any effect on the application.

The SDK is published as a jar on Maven Central, and needs to be shipped with your application. For example when using Maven as build system, the dependency should look like this:


Appendix 1: Trace REST Service

The above SDK offers very convenient support for generating additional spans from Java applications. Sometimes it would be very helpful to generate entry and exit spans in technology where no agent is running or there is no technology support existing (yet).

For those cases, Instana is able to add spans to traces (or even generate complete traces) just from data provided to it via a REST web service.

Some cases where customers have used this capability are: getting visibility into C code, adding data from handheld devices into traces, getting visibility into cobol code execution and into embedded devices.

We continue to work on making getting visibility into all these things easier with the ultimate goal of making it automatic. But in the meantime the design of the Instana tracing allows to manually add whatever is not covered automatically.

Initial documentation of this feature can be found in the above referenced github repository: