Java has been one of the most popular programing languages for well over a decade. Such prolific usage has created a need for robust Java monitoring tools that can run in production and can identify issues across every layer of the stack. This article is a deep dive into the most important aspects of Java, JVM, and full stack monitoring that you might not have considered before.
Important Java and JVM background
What is Java? The answer to this question is not simple. The term Java is used interchangeably to reference multiple elements. Beginning with the actual programming language Java. A language with a syntax heavily borrowing from languages like C or C++. Applications written in Java are compiled into bytecode (an intermediate language) and executed in a specific execution environment.
What’s the difference between Java and the JVM? That brings us to the second bit which is referred to as Java. Oftentimes, when people refer to Java, what they really mean is the JVM, the Java Virtual Machine. The JVM is the environment to read, understand and execute the Java bytecode. Some implementations also support JIT compiling, the “just-in-time” translation of bytecode to actual machine code, specific to the system the JVM is running on. While initially developed by Sun Microsystems (later acquired by Oracle), many different vendors and communities provide JVM implementations and development kits, such as Oracle, Azul Systems (Zulu and Zing), IBM, Amazon, OpenJDK, AdoptOpenJDK, with a multitude of options in terms of support and features over the officially defined standard.
Can other programming languages run inside of a JVM? This is where we need to discuss the Java ecosystem, which builds on top of the JVM platform and bytecode specification, but provides programming language options besides the Java language itself. These additional languages include, but are not limited to, Kotlin, Clojure, JRuby, Jython, Scala and many more. They all compile the source code to Java bytecode, and are referred to as JVM or Java platform languages.
Full Stack Java Monitoring
With multiple components being referred to as Java, the answer on how to monitor Java isn’t as simple as it sounds. To make things easier let’s break down the concept of full stack Java monitoring into 3 distinct categories:
- Java or JVM metrics monitoring
- Distributed Tracing across services
- Java Code Profiling
Java JVM Metrics: When people look for Java Monitoring, they most commonly look for a way to monitor the Java platform, the JVM. Java, for a long time, offers JMX (Java Management Extensions), which provide information about the runtime state of the JVM itself, the Garbage Collector, and other internal elements. It can, furthermore, be extended by the running application to provide additional metrics to outside metric collectors. With the help of the JMX, clients can collect, show and gather metrics.
During development, common tools include VisualVM and JDK Mission Control (also simply known as Mission Control). The latter one includes an additional way to profile applications and collect data from the JVM at extremely low overhead, called Java Flight Recorder.
JBossAS running on the JVM and VisualVM connected presenting JVM metrics (source:Planet JBoss)
This setup works for any JVM based language and application. For application specific metrics though, additional monitoring tools like Prometheus, StatsD or Micrometer may be required.
An additional, commonly found alternative is using an open source tool like Prometheus to gather and collect metrics. Prometheus provides a number of integrations and wrappers for commonly found Java based tools and frameworks, which provides a quick way to integrate it with the services in question. It also has support for many other technologies over Java. Every tool listed in this section has a major shortcoming. They all collect and display metrics in isolation without the context of responsiveness or service dependencies. Metrics in isolation can be helpful at times, but metrics with proper context will always help you find the root cause of a problem in the fastest possible time. We’ll explore how to get this context when we talk about how Instana approaches Java monitoring.
Distributed Tracing: When using a JVM based language to implement application services, it’s absolutely necessary to do more than just metric monitoring. Identifying and solving problems with Java applications quickly, requires an in-depth understanding of the architecture and the communication between services. This is achieved by performing end-to-end tracing of requests flowing throughout the system.
Open Source APIs like OpenTracing or OpenTelemetry help to collect Distributed Traces of the different services and technologies in use. Unfortunately it is up to the user to integrate tracing points into their application (this is a tedious process and must be maintained), and even worse there is no context carried between the traces, metrics, and dependencies.
To complicate matters further, the infrastructure required to collect, store and analyze data from open source point solutions needs to be set up and managed by the user. Cost for data storage and computation needs to be accounted for. While good, for example, Prometheus was not designed to scale out, therefore the setup will grow to multiple instances over time.
The biggest issue is the burden of manually correlating metrics, data, and distributed traces across different sources. Due to the nature of open source, different tools store their information chunks differently. As a result, the user is left with a set of disconnected elements, trying to piece together the jigsaw puzzle. Trying to connect to the dots during an outage situation, as well as getting to the root cause, is an unnecessarily complicated and lengthy process when using OSS tools.
Java Code Profiling:
Code profiling has been around for decades. It started out as something you would only do in a dev or test environment because it used to be very heavy from a CPU and Memory perspective. Over the years, production grade code profilers have been created which are extremely lightweight and therefore can be run against a production environment. The purpose of a code profiler is to answer the following question: What code is causing a problem within my running application service?
Full Stack Java Monitoring Using Instana
Java Monitoring from Instana automatically collects and correlates metrics, distributed traces, and code profiles almost no effort required.
When using Instana to monitor Java services and applications, the Instana Agent automatically and continuously discovers JVMs and technologies being using inside of it:
- JVM vendor
- JVM version
- Database connectors
- Upstream and Downstream services
- many more…
Furthermore, Instana automatically discovers Java services being deployed into many different environments like Docker, Kubernetes, OpenShift, Cloud Foundry, or running as plain processes on the host machine. After discovering the JVM, the Instana Agent connects to the running process, and analyzes the architecture of the service, discovering additional aspects, such as the framework being used to develop the service (like Spring Boot, DropWizard, …), application servers (like JBoss /Wildfly, Websphere, …), and database connectors. In the last step, the Instana Agent automatically instruments the running process and starts capturing important metrics and distributed traces right away. No process restart required.
At Instana, we recognize the different perspectives one may have on “Java”, and use appropriate entities in our Dynamic Graph (logical model of the stack, services, and dependencies) to represent them. So, for example, a Spring Boot application is represented by:
- A Spring Boot entity, which is part of a…
- Java entity, which represents the Java application running in a…
- Java Virtual Machine, which executes inside of a…
- Docker Container, which runs as an…
- Operating System process, running on a…
- Linux host
Think of the list above as a vertical dependency stack where different metrics need to be collected at every layer. Instana automatically collects these metrics and models the vertical dependencies as shown in the screenshot below.
But there is more to an application than individual stacks. The services sitting within those stacks talk to each other creating dependencies of their own. You can visualize these as horizontal, or cross service dependencies. Again, Instana automatically detects all calls between services and creates distributed traces showing the end-to-end flow of every request. You can see this represented in the screenshot below.
How can I know what Java method is causing a problem? The final piece of the Java monitoring puzzle is production code profiling. We already have full stack metrics and distributed traces so we can understand if resource contention is an issue and exactly which service is having a problem. What we can’t tell yet is which method within a running application service is causing performance problems. Enter the always-on production Java profiler… Java code profiles will show the exact method(s) that are hogging CPU or are responsible for that long wait time.
Install Full Stack Java Monitoring in 1 Step
How to install the Instana Agent depends on the system to be monitored but it’s always really easy. The Installation Wizard inside Instana’s Web Interface provides the user with a choice of setup techniques by environment type.
Instana Web UI installation wizard screenshot showing how to install on Linux
Apart from Java applications, the Instana Agent will discover many more supported technologies, and set them up for automatic monitoring, too. Instana’s single agent per host implementation keeps the monitoring overhead extremely low and greatly simplifies the overall installation and maintenance process.
Using Instana to collect all important metrics, distributed traces, and code profiles brings the benefit that Instana stitches together all information to provide a full, end to end view of the contextual dependencies and impact between the different components, including automatic discovery of upstream and downstream services.
With Instana, it is not up to the user to manually determine which services are part of a degradation, or why a specific service seems to be impacted by an issue on another component. Instana automatically generates the necessary relationships between all system components. Furthermore, it understands the system’s architecture and dependencies, down to the level of container instances and container hosts a specific request was executed on at a specific point in time. All that information is used to create correlations and provide the necessary evidence in case of incidents to quickly find the culprit (root-cause) and help to decrease the time to resolution.
See Instana’s Java and JVM monitoring by using our interactive sandbox observability environment today.