A Lot Can Happen in 10 Seconds!

A robot showing observability and remediation solutions

When would 10 seconds become a big deal?

When it comes to application performance.

For cloud-native microservices applications, 10 seconds is a long, long time.

The things that can happen to your applications in 10 seconds are inexhaustible and most are not good.

But, before we dive into the details about what could happen to your applications, let’s take a look at some real-world events that show what can happen in 10 seconds:

  • Usain Bolt can win the 100-meter gold medal at the Olympics (in 9.58 seconds, to be exact)
  • A traffic light can change from green to yellow to red
  • Users decide in 10 seconds if they want to stay on your website.
  • Cure Your Hiccups
  • De-Fog Your Mirror After a Shower
  • Fold a T-Shirt in Two Moves
  • Do All Kinds of Mental Math
  • Solve a Rubik’s Cube (In 2 Moves)
  • Pick a Lock with a Paperclip

We particularly like the Usain Bolt example because that amount of distance is a long way to run in less than 10 seconds!

For cloud native application performance and availability, 10 seconds is an eternity. Transactions are zipping around all across the internet, keeping the wheels of commerce well lubricated

What can happen in 10 seconds if something goes wrong?

  • Thousands of transactions can experience delay or crash and not complete at all

With this type of problem, revenue can drop due to lost sales. Customers will abandon shopping carts and your site and find another place to buy what they want. And your brand image can suffer.

Why, then, would it be acceptable for Observability tools that capture metrics slowly or worse sample and aggregate metrics and traces? How can a platform like that be viewed as equivalent to an Observability platform such as Instana that gathers and contextualizes information at the speed of modern microservices. They allow the problems described above to linger for an extended period of time until the information you need to remediate the problem is available.

The Prisa Tecnologia story

For PRISA Tecnologia, performance is key. When they encounter a performance problem, it has an immediate and detrimental impact on the business performance and the consumer’s perception of their brand. “A one-second time difference in displaying content makes a huge difference to our audience’s experience.”

Jorge Tome Hernando, Director of IT Architecture, Operations, Security and Workplace, PRISA Tecnologia

Instana’s major observability competitors either sample metrics at 10 second intervals or aggregate metrics in one-minute intervals or more, compared to Instana’s ultra- precise one second metric interval. Instana also delivers notification of an issue within 3 seconds. This is illustrated in the Observability Detection Gap diagram below.

Can you really afford to wait 10 seconds or up to a minute for you Observability platform to tell you there’s an issue? With manual triage, maybe. But with automated or even semi-automated remediation you cannot.

 Why are Fast Observability Metrics and Transaction Traces so Important?

For all applications, speed and reliability are the goals. To achieve better application performance AND reliability, the go to strategy that “a human always needs to fix a problem (MTTR)”, has to change. Human intervention to fix remedial will overburden human resources restrict the pace of change. It will also reduce SLIs.

The Dealerware Story

“With Instana, our day-to-day goal is to be able to guarantee a latency expectation. Our goal for service calls is to complete within less than 250 milliseconds. So, it’s not just for fire drills. In the day-to-day, we’re able to improve performance, and that drives us toward that 250 ms goal. Instana makes this possible.”

Bryce Hendrix, Lead Platform Architect, Dealerware

For improved performance with higher availability, automated AIOps is the way forward. Automated AIOps will provide additional automation combined with AIOps is a path forward for achieving levels of higher levels of performance+availability.

How? By letting automated AIOps resolve issues that the machine can flawlessly correct much faster than a human.  There are many issues regarding infrastructure resource allocation and others that the machine can remediate/prevent before a human can even intervene.

Does that mean all application issues can be resolved with automated AIOps? Of course not.

There are many complex logic issues that only human triage can resolve, such as code issues and the like. But there are also many issues where automated AIOps is faster, more efficient and should be preferred for issue remediation.

In my previous post about Mean Time to Prevention or MTTP. MTTP is classified as the amount of time that Observability+AIOps takes to prevent an issue from negatively impacting hybrid cloud applications and infrastructure.

Automated AIOps adds a new option to the application issue remediation continuum. The diagram above illustrates that continuum starting with fully automated issue remediation down to the human MTTR staple.

In the continuum, Observability is the starting point for every type of remediation. The longer it takes for an issue to be detected by the Observability platform, the longer it takes to begin the remediation process.  That means when automated AIOps is added, the difference between 1 second detection and 10 second or more detection becomes huge. If your application can afford to wait 10+ seconds for an issue to be detected, why use automated AIOps at all?

Automated AIOps remediation is the wave of the future. It’s the next logical step how to improve application performance and resiliency. Infrastructure performance issues often outweigh microservices code issues will continue to do to so into the future.

The Issue Remediation Gold Standard

The new gold standard for application issue detection and remediation will become automated Observability+AIOps. They will be used in tandem to help ensure that issues don’t devolve into major problems.

If you want to achieve the full benefits of automated AIOps remediation, you need high frequency, ultra-precise metrics and traces to feed the AIOps engine. And you can get them for a fraction of the cost of the “slower” observability technologies.

Indeed, a lot can happen in 10 seconds. With real-time metrics and automated AIOps, you can ensure that the bad issues don’t happen to your applications.

Play with Instana’s APM Observability Sandbox

Announcement, Developer, Featured, Instana on IBM z/OS
Dear IBM Instana Family, We're just a few weeks away from embarking on a new year. And as we ring in 2023 with confetti, resolutions, and renewed aspirations, the IBM Instana team...
Happy holidays! Thank you for your continued support of IBM Instana! Here's a little holiday treat from us to you. :) Download the wallpaper size that fits your MAC, PC, iPhone or...
Developer, Product
By Jeya Gandhi Rajan M  This article explains how to create an application or application Perspective in Instana. Pre-Requisite: The Kubernetes-based application is deployed in an OpenShift container platform or on Kubernetes cluster....

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit instana.com.