Instana AWS Monitoring: Feature Roundup

Instana is rapidly building out unparalleled support for monitoring the many different products provided by Amazon in AWS. Our latest release adds first-class support for CloudWatch (KPI metric service), SQS (message queueing service), and RDS (database service). The important question is, what do I mean when I say, “first-class support”?

For Instana users, first-class support means the following:

  • Full integration into the Instana Dynamic Graph (Click here for an explanation of the dynamic graph) – This model is used for correlation and causation.
    Value: Know with certainty when one component is impacting other components or the service as a whole.
  • Relevant metric collection
    Value: Monitor the metrics that most clearly indicate problems. No guess work required.
  • Fully AI enabled behavioral learning – Value: Know what is normal for any metric
    • Predictive analytics – Value: Get warned BEFORE an issue occurs to take corrective action.
    • Automatic root cause analysis – Value: Immediate troubleshooting by a single person.
  • Traces generated for EVERY request
    Value: Troubleshoot any individual request.
  • Fully curated expert knowledge to identify problems and suggest remediation possibilities
    Value: Expert in a box. Supplement the knowledge of your internal experts with expert knowledge from the Instana team.

Immediately Identify Problems in AWS

Instana collects metrics from every host (EC2 Instance) where an agent is installed. These metrics are collected at an incredible 1 second granularity. It only takes 3 seconds for a metric to be delivered from originating host to display within the Instana UI.

High granularity and fast delivery rate are both important but for different reasons:

  • High granularity – Detect spikes in metrics that would be hidden at higher granularity levels. Most monitoring tools have a metric granularity of 60 seconds or more, with none less than 10.
  • Fast delivery – One of the most common causes of application impact is from unexpected spikes in application workload. IT teams strive to Identify and respond to these issues as fast as possible in order to avoid customer and business impact. Most monitoring tools suffer from a 1-5 minute delay in delivering data and alerting on performance issues. Instana eliminates this delay, thus eliminating the pain of slow alerting.

Let’s look at a real-world example. In the following screenshot the Garbage Collection (GC) data is averaged over 1 minute intervals. According to this chart everything looks fine and there are no excessive GC pauses (GC pauses stop your application for the time it takes to complete the GC).

The next chart is the same GC data but shown in full 1 seconds granularity available within Instana. This clearly shows that there are spikes in the data that last up to almost 1/2 a second. This can be the root cause of application performance issues and would never be seen unless you have high granularity data as with Instana.

Identify AWS Over-Capacity and Under-Capacity Problems

A major advantage of using AWS instead of a traditional data center is that you can “right size” your environment by taking advantage of easy and fast auto-provisioning capabilities. While this helps control costs it also comes with a couple of potential issues.

  1. Run away over-provisioning – there are multiple ways this can happen but the end result is the same. You end up with a much larger AWS bill than you budgeted for.
  2. Provisioning too little capacity and it’s impacting application response time.

Both of these conditions are easily identified using Instana’s “Comparison Table” feature. In the example below we see all of our AWS instances sorted by CPU utilization. We can clearly see that there is a CPU bottleneck with one host and that Instana is alerting and providing expert knowledge about the situation. This would be a good time to provision extra capacity for this service.

Undercapacity issue identified by Instana

Problem descriptions and expert knowledge applied to undercapacity issue.

In this next example we see an overcapacity issue which is simply wasting money and should be corrected. In the Instana search bar I have typed “xlarge” and the UI automatically filtered to just my AWS instances that contain “xlarge” in the “Type” tag. You can see in the screenshot that they are all type c4.xlarge. Notice that one host has both low CPU and Memory usage.

Drilling down into this host we can see that CPU and Memory remain very low over the past 30 days so we are just burning cash by running an instance that is simply too large. We could reduce the size of this instance and save that money.

CPU usage over 30 days is very low.

Memory usage over 30 days is very low.

AWS Tags

Tags are vital for effectively managing your environments in AWS. These same tags make it possible to map the data in your monitoring tool to the actual deployment in AWS. Instana automatically collects these tags and maps them to their related components. This saves significant time and effort with the added benefit of always being accurate since it’s a fully automatic process.

Combining AWS tags with Instana Dynamic Focus (Learn more about Dynamic Focus here) enable users to immediately filter the Instana UI to display only the entities you want to focus on at that point in time. I used this capability in the last section to find my overcapacity xlarge instances. Here is an example of filtering my infrastructure map to show only the instances related to the service I care about. I’ve filtered using the term “env=demo” which is a tag that was set in AWS and automatically collected by Instana…

At a higher level you can click the “Tags” button in the UI to select from a list of the tags that have been collected within Instana…

AWS Service Maps

Service Maps in the Instana UI show you a logical representation of how your application services are connected and communicating. The screenshot below shows the Instana service map and clearly there is a problem with the “shop” service being caused by issues with “productsearch”.

We could then drill down into the traces between the “shop” and “productsearch” services to investigate the root cause or we could just click on the incident report which shows the root cause… Elastic Search is in a split brain condition with 2 nodes selected as master. This issue can now be fixed.

Root cause of this issue was automatically identified as an Elastic Search split-brain condition.

AI-Powered Understanding

As you can see, Instana understands the relationships between the many components of your application and services stacks. Our combination of AI-powered understanding, 1 second granularity, and automatic configuration and observability create unparalleled value to anyone who runs applications in AWS.

As always, don’t take our word for it. Try Instana in your AWS environment for free today.