How Instana Safely Instruments Applications for Monitoring

Monitoring systems can usually only work with data the system under management makes available. This is called “Observability”. Many standard system components, like database systems or web servers, provide such endpoints to read monitoring data from. Unfortunately applications often do not provide sufficient data on their own, thus a monitoring tool has to interact with the system under management and make data available.
Instana performs this interaction for all kinds of systems automatically when needed. In this blog, I am going to explain how Instana does it specifically for the Java programming language running on a Java Virtual Machine.

How APM Tools Observe Java

The Java Runtime itself offers an observability interface called JMX. The Java Runtime exposes some monitoring data via JMX. The same interface is also available from a number of application container products and frameworks. However applications usually provide only few JMX monitoring endpoints, which are also hard to understand for a tool due to their custom nature. To obtain more data and truly understand application behavior, monitoring systems need to take action on their own and become creative. Solutions usually involve a mixture of two main techniques:

  • Watching code being executed to estimate what the application is doing, and
  • Rewriting code before being executed slightly to generate useful data.

Those two techniques are commonly referred to as “Sampling” and “Instrumentation”. Both have their own advantages and disadvantages: Sampling approaches are usually more lightweight but less accurate, while instrumentation typically generates more useful data but is ridden with technical challenges, some of which I will highlight in this blog. There are only a few tools, which strictly use either single approach, most use a mixture of varying degree. As does Instana.

applications monitoring http call

Instana performs its instrumentation automatically at runtime. This is possible thanks to a feature provided by the JVM: the Java Instrumentation API. It is designed to manipulate code before it executes. While there are plenty of use cases besides monitoring, it is especially suited for monitoring tools, which can get, for example, timings and call rates for methods.

The Instrumentation API works roughly like this:

  1. The JVM loads class files, which are produced by the Java compiler and usually packaged in JAR or WAR files.
  2. Then the Instrumentation API looks for any so called transformers and passes them the raw binary data of the class files.
  3. Transformers are then expected to produce new raw bytes, which could be something completely different.

There are some limitations however on what the transformers are allowed to do, and usually monitoring agents will not alter any of the original code, but rather “just” decorate it with some measurement code.

The Raw Binary Data Problem

To actually perform any sensible changes to the classes, transformers registered by agents need to parse the binary data into something they can understand, look for the places in code they want to hook into, perform the modification and then turn the result back into binary data.
As you might know, Java uses a so called byte code for its instructions. It is not as bad as dealing with 0s and 1s, but still much harder than dealing with the Java source code. The bytecode has also been rewritten by the compiler into instructions that utilize a stack system used by the JVM, so developing a transformer requires to think differently about the bytecode.
To read the binary data into something manageable, most people use a framework called ASM. ASM allows developers to actually read bytecode instructions, labels, methods and metadata etc., rather than the raw binary data. To hook into a method, an agent could look for the appropriate bytecode, emit the bytecodes it needs to do its stuff and then the rest of that method.
When those bytecodes are written back to binary, and then re-read by the JVM, the JVM checks if the byte code actually makes sense. So for example a 2 argument method call actually needs to have 2 arguments on the stack, or a read into a variable has to have the correct type. If you mess anything up writing your bytecodes, you can get anything from a java.lang.VerifyError to a segmentation fault of the JVM. Ouch.

Several libraries have been created to help developers write safer bytecode, by providing higher abstractions and simpler usage: Javassist, cglib and BCEL. Unfortunately, all these libraries still require precise understanding of the JVM and what code is safe to inject.
Back in May 2014 Rafael Winterhalter, being sick of the problems with bytecode generation libraries, decided to write a new library called Byte Buddy. While Byte Buddy still uses ASM behind the scenes to read and write bytecode, developers usually do not need to interact with bytecode directly, but rather write standard Java code, within an IDE like IntelliJ or Eclipse. All the regular tooling for Java code is available, including static analysis and automated testing. At runtime Byte Buddy double checks that the transformed code is not causing any problems.

Advantages of Byte Buddy

Safety

Compared to all other bytecode manipulation techniques, Byte Buddy offers the greatest safety for developers. It is impossible to generate illegal or corrupt code. The worst thing that can happen, is that the code added by ByteBuddy is not called, which might result in a monitoring feature not working, but there is never a problem affecting the monitored JVM caused by manipulated byte code.

Byte Buddy conforms to the contract of the JVM Instrumentation API, which means agents built with byte buddy are stackable with any other compliant agent.  

Developer productivity

It is much easier to develop monitoring code in plain Java, against plain Java APIs. Instrumenting any library can be done by directly compiling against it without any additional steps required. Being able to programm in Java of course comes with all the nice perks like static typing, tooling support and rich documentation. All things that are lacking when writing bytecode directly or using a generator.

Testability

When writing instrumentation for lots of different libraries, using Byte Buddy allows to easily write automated tests, which during testing will take any permutation of JVM and library version in a sample application and instrument it. The result can be easily verified with standard Java testing tooling every developer knows. This is possible because there are no intermediate generation steps with Byte Buddy.

Performance

Byte Buddy automatically optimizes the performance when instrumenting, as it aims at fewest additional instructions. It also recomputes important metadata of the class files, like stack map frames, which describe control flow and local variables works within methods. When using ASM developers would need to manually compute all these metadata by hand, or use ASM features, which generate suboptimal metadata because they are more generic.

Transparency

The application under management  contains additional code after instrumentation, but there is no reference to Byte Buddy. In fact it looks like these monitoring instructions have been in the code since its original compile.

Instana and Byte Buddy

Instana is a proud customer of Byte Buddy and is actively contributing alongside others to improve the performance, safety and features of the library. Rafael is working on our agent team and porting some features invented at Instana to the public library.
We use Byte Buddy for all our code instrumentation needs in Java. Compared to any other library, the code is much easier to understand by new developers, thus improving maintenance and further development.

It should be noted that we avoid instrumenting an application’s actual “business logic” . Instead we focus on reliable instrumentation of standard libraries, which we can extensively test and ensure that our agent performs as expected.
There are strong and very valid reservations against instrumenting an application’s custom business logic. Instana avoids this topic altogether by only applying instrumentation at the Java class library and known container and framework levels.   
With fully automated, comprehensive functional and quality testing in place, we can confidently stand behind our instrumentation approach. Venturing into instrumenting custom business logic would run counter to these standards.

In case your application is not leveraging any standard libraries, we will offer a Byte Buddy-based SDK to easily add monitoring to your custom code. As it is annotation based, it only requires very little effort to implement. Documentation and examples will be published as soon as the SDK is available.

We very carefully track and manage the performance impact our agent and instrumentation generate. In a future blog, we’ll take a deep dive into the actual costs of instrumentation and show how we keep them at a minimum.

For the inclined reader: A real code excerpt from the Instana Agent

A very common instrumentation use case is to intercept calls to servlets and process their data. Here is a snippet from our agent code which shows how this can be done conveniently using Byte Buddy:

public class ServletInstrumentation {
 
  public AgentBuilder instrument(AgentBuilder agentBuilder) {
    return AdviceRegistry.subTypesOf(HttpServlet.class)
        .advice((method) -> {
            ParameterList<InDefinedShape> parameters = method.getParameters();
            return method.getInternalName().equals("service") 
                && parameters.size() == 2
                && parameters.get(0).getType().isAssignableTo(HttpServletRequest.class)
                && parameters.get(1).getType().isAssignableTo(HttpServletResponse.class);
        }, ServletAdvice.class).register(agentBuilder);
  }
 
  private static class ServletAdvice {
    @Advice.OnMethodEnter
    private static void enter(@Advice.Argument(0) HttpServletRequest request, @Tag int tag) {
      String method = request.getMethod();
      String requestUri = request.getRequestURI();
      String traceId = request.getHeader(TracingHeaders.TRACE_ID);
      Callbacks.find(ServletInstrumentation.class).call(1, method, requestUri, traceId);
    }
 
    @Advice.OnMethodExit
    private static void exit(@Advice.Argument(1) HttpServletResponse response) {
      Callbacks.find(ServletInstrumentation.class).call(2, response);
    }
  }
}

 
As you can see, the code is very straightforward:
All sub-types of HttpServlet are scanned and if a method called service with the two expected parameters is found, the advice is applied.
The advice methods for enter and exit have access to parameters to read / write to, and can easily call out to other agent code for bookkeeping.
Since the instrumentation code is in source form, it can be compiled ahead of time and thus any run-time problems with types and variables can be avoided.