When would 10 seconds become a big deal?
When it comes to application performance.
For cloud-native microservices applications, 10 seconds is a long, long time.
The things that can happen to your applications in 10 seconds are inexhaustible and most are not good.
But, before we dive into the details about what could happen to your applications, let’s take a look at some real-world events that show what can happen in 10 seconds:
- Usain Bolt can win the 100-meter gold medal at the Olympics (in 9.58 seconds, to be exact)
- A traffic light can change from green to yellow to red
- Users decide in 10 seconds if they want to stay on your website.
- Cure Your Hiccups
- De-Fog Your Mirror After a Shower
- Fold a T-Shirt in Two Moves
- Do All Kinds of Mental Math
- Solve a Rubik’s Cube (In 2 Moves)
- Pick a Lock with a Paperclip
We particularly like the Usain Bolt example because that amount of distance is a long way to run in less than 10 seconds!
For cloud native application performance and availability, 10 seconds is an eternity. Transactions are zipping around all across the internet, keeping the wheels of commerce well lubricated
What can happen in 10 seconds if something goes wrong?
- Thousands of transactions can experience delay or crash and not complete at all
With this type of problem, revenue can drop due to lost sales. Customers will abandon shopping carts and your site and find another place to buy what they want. And your brand image can suffer.
Why, then, would it be acceptable for Observability tools that capture metrics slowly or worse sample and aggregate metrics and traces? How can a platform like that be viewed as equivalent to an Observability platform such as Instana that gathers and contextualizes information at the speed of modern microservices. They allow the problems described above to linger for an extended period of time until the information you need to remediate the problem is available.
The Prisa Tecnologia story
For PRISA Tecnologia, performance is key. When they encounter a performance problem, it has an immediate and detrimental impact on the business performance and the consumer’s perception of their brand. “A one-second time difference in displaying content makes a huge difference to our audience’s experience.”
Jorge Tome Hernando, Director of IT Architecture, Operations, Security and Workplace, PRISA Tecnologia
Instana’s major observability competitors either sample metrics at 10 second intervals or aggregate metrics in one-minute intervals or more, compared to Instana’s ultra- precise one second metric interval. Instana also delivers notification of an issue within 3 seconds. This is illustrated in the Observability Detection Gap diagram below.
Can you really afford to wait 10 seconds or up to a minute for you Observability platform to tell you there’s an issue? With manual triage, maybe. But with automated or even semi-automated remediation you cannot.
Why are Fast Observability Metrics and Transaction Traces so Important?
For all applications, speed and reliability are the goals. To achieve better application performance AND reliability, the go to strategy that “a human always needs to fix a problem (MTTR)”, has to change. Human intervention to fix remedial will overburden human resources restrict the pace of change. It will also reduce SLIs.
The Dealerware Story
“With Instana, our day-to-day goal is to be able to guarantee a latency expectation. Our goal for service calls is to complete within less than 250 milliseconds. So, it’s not just for fire drills. In the day-to-day, we’re able to improve performance, and that drives us toward that 250 ms goal. Instana makes this possible.”
Bryce Hendrix, Lead Platform Architect, Dealerware
For improved performance with higher availability, automated AIOps is the way forward. Automated AIOps will provide additional automation combined with AIOps is a path forward for achieving levels of higher levels of performance+availability.
How? By letting automated AIOps resolve issues that the machine can flawlessly correct much faster than a human. There are many issues regarding infrastructure resource allocation and others that the machine can remediate/prevent before a human can even intervene.
Does that mean all application issues can be resolved with automated AIOps? Of course not.
There are many complex logic issues that only human triage can resolve, such as code issues and the like. But there are also many issues where automated AIOps is faster, more efficient and should be preferred for issue remediation.
In my previous post about Mean Time to Prevention or MTTP. MTTP is classified as the amount of time that Observability+AIOps takes to prevent an issue from negatively impacting hybrid cloud applications and infrastructure.
Automated AIOps adds a new option to the application issue remediation continuum. The diagram above illustrates that continuum starting with fully automated issue remediation down to the human MTTR staple.
In the continuum, Observability is the starting point for every type of remediation. The longer it takes for an issue to be detected by the Observability platform, the longer it takes to begin the remediation process. That means when automated AIOps is added, the difference between 1 second detection and 10 second or more detection becomes huge. If your application can afford to wait 10+ seconds for an issue to be detected, why use automated AIOps at all?
Automated AIOps remediation is the wave of the future. It’s the next logical step how to improve application performance and resiliency. Infrastructure performance issues often outweigh microservices code issues will continue to do to so into the future.
The Issue Remediation Gold Standard
The new gold standard for application issue detection and remediation will become automated Observability+AIOps. They will be used in tandem to help ensure that issues don’t devolve into major problems.
If you want to achieve the full benefits of automated AIOps remediation, you need high frequency, ultra-precise metrics and traces to feed the AIOps engine. And you can get them for a fraction of the cost of the “slower” observability technologies.
Indeed, a lot can happen in 10 seconds. With real-time metrics and automated AIOps, you can ensure that the bad issues don’t happen to your applications.