When monitoring services, websites or mobile applications, immediate feedback is important. Operations, like restarting services, can cause large issues and costly delays.
Once Upon A Time
In the early days of monitoring, people had to do quite a few turns to enable extensive monitoring.
Gathering performance metrics were rather easy to capture with just a few lines of code, thanks to solutions like Java MBeans or early open source tools like Nagios, whereas transactions and code traces were complicated if even possible at all.
I remember plenty of log files with Java exception stack traces, just to figure out where an issue originated from. Obviously with the millions of other lines of logs. Good old days of using a log file for everything, … not.
Time Travel to JustInTime (JIT)
Today the world is vastly different.
While code changes are not completely gone, many monitoring systems today offer solutions to save you from changing your code as much as possible. Most revolve around runtime code instrumentation.
Instrumenting code means to add little bits of code into the actual application. This technology is commonly used with bytecode languages based on the Java Virtual Machine (JVM) or the Common Language Runtime (CLR) from .net.
Apart from that a pretty new VM with bytecode emerged a little while ago, the BPF. An in-kernel virtual machine residing inside the Linux kernel, which provides a range of possibilities, from filtering network packages, to capture events, performance metrics and other functionality at almost native speed.
Instrumentation can also apply to native processes though, but it’s quite rare to find in the wild, since it can be kind of dangerous to do. An example for this kind of technology, even though not related to monitoring, is the Linux Live Patching capability (kpatch). This feature of the Linux kernel provides the possibility to live patch the running kernel, without an actual restart.
To Be Or Not To Restart
While many monitoring systems try to prevent code changes, activating these instrumentation processes is still a different story.
When looking at the example of Java, instrumentation is most commonly implemented with a combination of the Java Instrumentation API and the Java Agent API.
If these two APIs don’t ring a bell for you, the first is used to change classes before they’re loaded by the JVM, the latter is used to inject our instrumentation into the JVM, a so-called Agent.
Agents are provided with their deployment unit (a JAR file) as a command-line parameter when starting the JVM. This means to provide an agent, the JVM needs to be restarted.
Restarting a Microservice is Fast!
That restarting one of those big, clunky monoliths is an issue, is probably obvious, but with microservices we can restart one instance after another, can’t we?
Yes, we can!
That means, many people now think the problem is solved and they shouldn’t have to care anymore. The problem, however, is still present.
Restarting hundreds of microservices, may not only burn plenty of CPU cycles with loading classes and compiling the bytecode to machine code, but it also costs time. In many organizations, adding something new and reboot services requires change controls. This can take a loooooong time depending on organizational processes.
I like to call it “time to value”, meaning the time from where you think you get monitoring, to the point where you actually get the information. Restarting many services in e.g. Kubernetes will simply end up being quite time consuming. When you try to analyze a current issue, you may even dismiss the problem by restarting the service.
Apart from that, during the restart we need additional resources. I hope though, that a few restarting services are already part of your resource calculation.
Automatic and Instantaneous
That said, providing automatic monitoring and automatic instrumentation is just not enough. You want to prevent service degradation from restarting service instances, and you want immediate insight.
The only solution to that kind of request is to attach to an already running process, and instrumenting it on the fly, providing the instant feedback you’re looking for.
With that, a great monitoring solution must bring three ingredients to the table
- the capability to instrument code and prevent code changes
- the capability to automatically supply the instrumentation to the process
- the capability to instrument the process without a restart
Not saying the other solutions are inherently bad, they’re just not great.
Instana’s AutoTrace technology provides those three major points for many programming languages. Furthermore, Instana offers automatic detection of running services, mapping of dependencies and connections between services and previously unseen immediate insight into service health and performance.
Try out Instana today with the free 14 days trial and get immediate feedback in minutes, not hours.