Understanding the health status of your systems is important, and Observability tools, such as Instana, help you to achieve deep insights into every element of the overall architecture. As long as everything is fine, that is enough. When things get out of hand though, then what?
With your DevOps teams on call, there are people around to immediately react to any alert. If one comes in, the first thing to do is to figure out what happened, what services, machines, customers are affected, and finally who to call to get additional insight into those components.
Trying to understand the problem while finding the necessary people can be tedious. Especially if a physical / virtual host decided to take some time off.
In a high-pressure situation, many questions need to be answered quickly: What services, or service instances, were running on this machine? Is there some recovery protocol to take into account when restarting services, or databases?
Due to MTTR being one of the key metrics operations teams are measured by, every second saved in this process is a huge win-win situation.
The best way to bring down the investigation time, is to provide as much information and context to an incident as possible.
Instana already provides functionality to correlate, group, and order multiple issues (in different services, and infrastructure elements) into a single incident, when related. This understanding is powered by Instana’s Dynamic Graph, providing an immediate view of the different components that are affected by an overall incident.
To correlate issues, Instana’s Agent automatically collects, analyzes, and contextualizes information around
- Services and Service Instances
- Frameworks used to build the services
- Process and Runtime Environments, such as the JVM, .NET CLR, nodejs, …
- Service Dependencies, such as HTTP calls between services
- Distributed Traces across the service landscape
- Cluster nodes and infrastructure those services run on
All above information, and more, is used by Instana to automatically calculate service maps and dependencies, as well as correlate issues. Removing much of the initial investigation time.
Wouldn’t it be awesome though, to bring down the initial time even further?
Instana provides sophisticated alerting options that streamline the resolution process for the on-call person. Along with predefined and best-practice alerts, Instana offers the possibility to add custom alerting rules, as well as multiple ways and integrations to notify on issues found. Including integrations to commonly found platforms like PagerDuty or VictorOps.
But Instana wouldn’t be the leader in Enterprise Observability if we only kept the Status Quo. We love to go the extra mile, and to that end, Instana has added a new feature to the Alerting system that brings the best of the Dynamic Graph to your fingertip.
With the new option to add Custom Payloads to your alerting notifications, Instana not only offers the option to provide rules or customer specific static content, but delivers a fully customizable and contextual notification platform, too.
The context to be included in the alert can be defined by using available properties in the Dynamic Graph. That said, it is possible to automatically add the names of all services, running at the time of fault on a specific machine.
Adding tags to your services containing the email addresses of the responsible developers will yield their email addresses to be included in the notification, just ready for you in case additional information is required.
Almost everything Instana knows about your application infrastructure can be added to your custom alert, providing immediate feedback without needing to open up the Instana dashboard, and doing the initial investigation. Contextual Information that really matters, Super-Charged.
While bringing down MTTR by supplying a human with as much contextual information as possible to quickly make educated decisions is great, the real benefit of additional context lies in automation.
At Instana we strongly believe that automation is the only way forward to manage the ever-growing complexity of modern, scalable solutions. That said, imagine a self-healing system built on-top of automatic, context-rich alerts and triggered through a webhook call.
With Instana providing all the necessary information and context of failed services and affected entities, automatic healing scripts have enough data to rebuild the infrastructure and bring it back to a healthy state in seconds. Even before a human reads the alert notification. What a time to be alive.
This new and awesome functionality is available immediately to all customers from the Alerting configuration. Start to use it today. If you don’t have an Instana account yet, sign up for an Instana Trial right now and get the level of visibility and contextual information you need to solve incidents fast.