The SRE Guide to Hyperscale for Cloud-Native Applications

The SRE Guide to Hyper-Resilient for Cloud-Native Applications at Hyperscale

In my previous post, I discussed the advantages of using Instana Enterprise Observability for achieving hyper-resiliency for applications, particularly cloud-native applications. Hyper-resiliency is usually defined as 99.99% system and application availability, or four 9s. Essentially, it is the ability to perform non-stop computing.

In the cloud, high availability can be difficult, even with the ubiquitous use of cluster technology. Meanwhile, hyperscale for cloud-native applications occurs when infrastructure resources are properly allocated to applications as they scale. If resources are mis-allocated, especially if they’re under-allocated, application performance can degrade or even stop.

Instana Enterprise Observability helps keep applications available by notifying app teams when problems begin. Granular metrics, events, and traces with context enable teams to rapidly identify issues.

If the availability or performance issues are caused by under-allocated or unbalanced resources (CPU, memory, network, and storage), Instana can pass that data to Turbonomic, another IBM company. Turbonomic provides Application Resource Management (ARM), which automatically and dynamically manages and allocates infrastructure resources for applications.

Combining Turbonomic ARM with Instana Enterprise Observability keeps application resource allocation optimized to ensure Service Level Objectives for both performance and availability. ARM procedures can be fully automated or partially automated to enable server resource adjustments that enhance application resiliency and performance, and optimize resource allocation cost.

How ARM and observability work together

Instana monitors application metrics, events, traces, and logs to provide a rich mosaic of application health information. It captures these measurements at unmatched one-second intervals. At this frequency, Instana can observe and identify any issues, either application or infrastructure, and match them with upstream and downstream dependencies in real time.

One-second monitoring granularity is one of the most critical attributes for hyper-resiliency because longer sample times of 10 seconds or higher are not adequate for detecting anomalies. Events in microservice applications and the surrounding infrastructure take place in microseconds, meaning that they can go undetected for a long time with sampling.

Events in microservice applications and the surrounding infrastructure take place in microseconds, meaning that they can go undetected for a long time with sampling.

Instana’s Enterprise Observability powers rapid anomaly recognition so Turbonomic can apply problem remediation to provide the strongest SLO compliance. If it’s a code issue, Instana’s Auto Profiler identifies the problematic code within a few clicks.

The combination of Instana + Turbonomic creates a seamless and automatic remediation path for any issues that are attributable to mismatched application resources.

For cloud-native applications, those mismatches happen frequently. One moment your applications are starved for resources due to a sudden surge in activity; moments later, they’re over-allocated as the demand surge drops.

When application infrastructure resources are low for any microservice, performance degrades – or worse, service crashes. Instana identifies the slow application response time, highlights constrained resources that may be the root cause of the disruption, and passes that data to Turbonomic.

Turbonomic knows exactly why the resources are constrained and the right adjustment to remediate the disruption. These actions are illustrated in the diagram below, which highlights how Turbonomic adjusts constrained resources based on a target response time.

Turbonomic adjusts constrained resources based on a target response time.

Proper resource allocation is critical

Turbonomic acts when resources are under-allocated to make sure that performance degradation (or worse) does not occur. Turbonomic automatically adjusts application resources to avoid resource contention or under-allocation that can negatively impact SLOs.

Conversely, when resources are over-allocated, Turbonomic automatically makes adjustments based on thresholds you define. This helps dramatically reduce cloud overspend, which is equally problematic

Instana + Turbonomic is a power combo that will rapidly become an SRE’s best friend. The combination enables hyperscale with hyper-resiliency, cost effectively. It paves the path to automated SLO compliance and continuous performance consistency, especially for your cloud-native applications.

Try out Instana with a guided tour in our Play With environment.

Play with Instana’s APM Observability Sandbox

Customer Stories, Product, Remediation, Thought Leadership
Sounds impossible, right? After the initial surge of cloud-native application implementations, many enterprises found that in order to ensure application performance, it’s necessary to add additional hyperscaler resources, especially during peak demands....
Featured, Thought Leadership
Real-Time Observability is a critical monitoring and tracing capability that leads to improved application and systems software health. Why? Because it provides the fastest notification of software and infrastructure issues possible. It...
Does your company use WordPress? As the de facto standard for content marketing, it is an incredibly popular piece of open source software. But supporting it can present challenges for engineering departments...

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit