Instana Tracing: No Configuration. No Impact. No Sampling.

Instana is designed to monitor modern applications that are built on microservices and are highly dynamic and largely scaled. When we started to think about how a distributed tracing feature should be designed for such environments it seemed like an impossible task, as we wanted to be able to:

  • Trace without any configuration,
  • Trace without any manual instrumentation,
  • Trace every service request – no sampling,
  • Trace without measurable impact on the application,
  • Trace with automatic correlation between service and underlying infrastructure metrics.

After building a completely new technical approach, and filing many patents, we are proud that we are releasing our new tracing feature which is the first product that can trace every distributed request without configuration and without measurable impact on the application!

Impact on an application can occur in various ways that are influenced by the instrumentation technology and implementation. The instrumentation code of the APM tool could:

  • Add execution time (latency) to every service request which makes them slower,
  • Add additional objects/memory to the application code that may add overhead on garbage collection,
  • change the way the runtime (e.g. Just In Time Compiler) interprets the code which leads to different or less optimized applications.

Instana’s instrumentation is built such that there is no measurable impact on the application – meaning the instrumented application will behave and perform with Instana the same way as without Instana. The overhead we impart is not in the running application process, but is in our Agent’s process only. That overhead is limited solely to adding a low percentage of CPU on one core of that host if free CPU is available.

We have improved our presentation of the traces and trace details so that it is easier to investigate errors and performance bottlenecks and drill into their dependencies back to the physical layers of middleware and infrastructure.

tracing

We have reworked the UI for better usability – especially because we increased the amount of captured details. The new Trace View is two panels: 1) the full list of traces on the left and, 2) the trace details on the right. The list on the left refers to the traces captured during the timeframe indicated by the timeline, always on the bottom of every Instana page. The new sortable table view and a search functionality enable you to quickly navigate the entire list of traces or directly search for a specific trace, respectively.

To investigate a specific trace, the right side of the screen is broken down into: header, Icicle-Graph and call tree.

trace header

The header provides a quick overview of a trace, including timestamp and name, with a link to the initial component where the trace started. The header also shows:

  • total execution time of the trace
  • number of errors
  • number of remote calls
  • depth of the call tree.

The symbols at the end indicate the number of so called spans per type (and the time spent per call type). A “span” is a set of data and time measurements corresponding to a particular RPC or service call. In this case there is one database span and four http spans.

The Icicle-Graph below the header (screenshot above) shows the spans in a quick overview. From left to right you can see the time spent in a span and from top to bottom and observe the call hierarchy. The color of the span indicates the type. The graph can be used to find long running spans or to identify call patterns. Clicking on a span in the graph will navigate to the span in the call tree.

traces

The call tree shows all the details of the distributed trace. On the left, the measured “wall clock” time is shown, including the percentage the corresponding call took of the overall time. For each individual span Instana also shows the “self time” which is the time that was burned just in the span without calls to underlying services. Asynchronous calls are marked on the left of the span. Underlying components are shown on the span – clicking on them will bring you to its dashboard to further investigate details.

Clicking on a span will show more details:

If an error was detected in the span (e.g. HTTP status code) the span is red bordered and has an error symbol attached. Our Tracing feature now automatically detects errors like exceptions, error logs or status codes and will use this information to mark Spans and Traces as erroneous.

In the call tree you can also find the last method invoked before the span – clicking on this method shows the whole stack trace that led to this service call:

The new Trace View provides a quick and easy way to identify root cause for slow traces or errors in highly distributed applications.

We believe the new Trace View and the newly introduced Logical View in combination with our machine learning driven approach to manage Quality of Service and our Dynamic Graph for modeling and understanding complex and dynamic applications are the foundation of a brand new paradigm of APM.

As always we are eager to hear your feedback, so please reach us to start using Instana and tell us your feedback.