Introducing Dynamic Focus for Application Performance Management

At Instana we are constantly working with our customers to help them monitor and manage their dynamic and scaled application environments.

Especially in microservice and containerized environments, we see the number of components exploding. Or as Yuri Shkuro from Uber says in his recent blog post: “The other reason complexity increased was a move away from large, monolithic applications to a distributed microservices architecture. As it often happens, moving into a new microservices ecosystem brings its own challenges. Among them is the loss of system visibility into the system, and the complex interactions between services.”

A monolith can easily be split into 20-30 microservices—each service running multiple  instances of different versions, with optional feature flags enabled. One monolith typically results in hundreds of moving parts that are constantly redeployed and that communicate with each other. This resulting runtime architecture is referred to as the microservice “Death Star”. Take this example from Hailo that they posted in one of their blogs about their journey to microservices.

Today you will need a modern, distributed tracing system to understand and analyze the dependencies and performance of your digital business services. It is also common sense to collect and analyze the metric/time series data of your components in order to understand their health and status. But there is another problem with modern applications and that is scale. As noted above, breaking up a monolith creates about 100x more active elements (components in the stack). Once an application starts to grow, this grows to 1000s. We have engaged with many customers with over 100 microservices built upon many thousands of components. In these environments, we have learned from our customers that they need focus!

Focus means reducing the number of infrastructure and software elements in the problem scope, thereby reducing the complexity the mind must comprehend and therefore the time spent on problem analysis.

With our newest release of our Application Management solution, we are introducing Dynamic Focus, a unique and easy way to manage the scope of focus to specific areas of your system and application stack. Dynamic Focus introduces new search and filter (a saved search) capabilities that takes advantage of our unique architecture and capabilities:

  • Dynamic Graph – Our cohesive model of physical and logical components including all historical attributes and metrics. The Dynamic Graph also includes distributed traces and events. All nodes in the graph are connected based on their dependencies. The various views in Instana visualize the data of the graph. Dynamic Focus actually filters the graph to provide specific subviews and focus.
  • Autodiscovery – All components, and their dependencies, held in the graph are auto-discovered in realtime by our Agent. Any saved filter will adapt to the current situation of your systems and applications without any manual change or configuration.
  • Incidents – Our Incidents are based on KPIs and machine learning and provide condensed information about an incident. This includes the dependencies to services, traces and physical components and they can be used by our Dynamic Focus to filter only the components that are part of a particular Incident.
  • Distributed Tracing – We automatically discover and capture every distributed interaction (message) between microservices as a Trace and record its dependencies to services and physical components. Again a Trace and even its Spans, like a SQL call, can be used in our Dynamic Focus to generate a subset of the Dynamic Graph.

Focusing on Application Components

To understand the concept of Dynamic Focus we will introduce it using a simple example. We take the graph representation of a zoned application that is running on two hosts with four containers each. There is a JVM running in every container. Each of the two hosts runs two Tomcats and two ElasticSearch instances which provide a service per host. Our Dynamic Graph model would (simplified) look like this:

Our new filtering capability allows us to reduce the nodes in the graph based on a search query.

A query for entity.type:tomcat would match all Tomcat instances in the Graph. This means that containers with ElasticSearch entities and the corresponding services would no longer be visible when visualizing your infrastructure, application and traces in our standard views.

Any chart, change, issue, incident, or trace in Instana would then only be seen in the context of the reduced Graph.

Another example query could be to search for all JVMs with 1.8 version: entity.jvm.version:1.8.*

In this case the Graph would look like:

This is a powerful tool to find and focus on single entities or groups of entities within the Graph that match certain criteria.

Focusing on Management Attributes

Instana also has structures that are outside of the Dynamic Graph but do reference it, for example, Events (like a Change), Issues, or Incidents. An Incident could contain services where Instana found KPI outliers, as well as containing entities with issues like high CPU load.

Using the new search capability, we can query and filter for Incidents like event.state:open event.type:incident. This would filter all active Incidents in the currently selected timeframe. In this example, we assume there is only one active Incident when a degradation of an ElasticSearch service causes subsequent degradation of a Tomcat service.

As the Incident that matches will have references to the two services, the query will filter to only the relevant entities.

This also works the other way around. If we search for a Service, then only the Incidents that contain this Service would be shown.

Focusing on Traces

The same functionality works with Traces and the included Spans, like an executed SQL Statement. Let’s assume we want to find all the Traces that call ElasticSearch:
span.type:elasticsearch

You can also combine these queries if you are interested, for example, in all Tomcat servers that call an ElasticSearch instance you combine two queries:

span.type:elasticsearch entity.type:tomcat

Because you can use any attribute of the entities, it is easy to focus on certain areas. As an example, you could filter all the traces that have executed a SQL that took longer than 200ms:

span.duration:>200 span.type:jdbc

If you then switch to Instana’s infrastructure view, you would only see the servers and containers with traces that executed a SQL query > 200ms from java based services.

As an example see this Physical Map before using Dynamic Focus:

We then search for all Traces that have a SQL statement that took longer than 200ms:

Which in this case filters the number of Traces in the selected timeframe from more than 800k to only four Traces matching the search allowing the developer to focus their attention.

Switching back to the Physical Map, we will see the filtered map that shows only the hosts and processes that were part of the four matching traces.

Shifting the focus in one view applies the filter to all views until it is removed

Overall, Dynamic Focus is a powerful tool for understanding your complex applications, especially during the investigation and triage of service quality issues. Whether you’re just trying to filter for a technology type (such as MySQL databases) or a specific microservice, the use cases are unlimited. You can now focus on only what matters to you.  

Use Dynamic Focus to filter based on changes, logs, or issues; then combine them with any component and attribute of your system and applications. And because any live filter applies across all views, easily dive into whatever analysis is needed, whether in the Application Map, Infrastructure Map, or the Incident View. Even our TimeShift historical records will leverage the active filter in place.

Stay tuned for more specific examples of Dynamic Focus in action.