Why One Second Matters When Monitoring Applications

Chris Farrell recently outlined 6 pillars of modern application management. One of those pillars was one-second granularity for measurements.  I wanted to write about why one-second granularity is so important when monitoring modern applications.

One Second On The Web

What can happen in 1 second?

On the web, quite a bit. Here are a few examples of what happens every second:

  • Google: 68,000 search requests
  • Amazon: $2,361 average sales
  • Netflix: 655 hours of video streamed

What would an application problem cost?  What if you didn’t even see there was a problem for a full minute? Or worse, what if the first time you found out about it was when users started calling?

  • 1 minute of outage would result in almost 3 million unfulfilled Google search requests
  • A 1 minute outage on Amazon would cost $141,660 in missed sales
  • And a 1 minute Netflix outage would result in 39,300 hours of catching up

With so much that can happen in a minute, it’s now time to start thinking about the accuracy of our monitoring data. Monitoring products that operate on one-minute time frames (collecting 1 metric datapoint per minute) are not going to be suitable going forward. These examples are just the tip of the iceberg, as many organizations find that their management tools are inadequate simply because they average data over too big a time slice.

1-second

And Then Came Docker

The Docker website describes Docker as  “an open source project to pack, ship and run any application as a lightweight container.” Docker enables the DEVOPS teams to not really care what’s inside the container, they just know they need 5 – 10 containers running to support a microservice. Start a script and within 15 seconds, you have a whole array of new containers altering and possibly impacting the performance and quality of the services.. In this brave new app world, 1-second visibility (or granularity) isn’t a “nice-to-have” monitoring feature. It’s critical.

One second resolution along with immediate continuous discovery is essential for monitoring containerized apps

Why are high resolution time-based metrics and events critical to the modern Application Performance Management Solution?
With applications growing in both size and complexity, the number and kind of components have grown substantially. Application Servers have been traded in for a polyglot of single purpose messaging or object handling services combined with custom “micro”services delivering small bits of business logic.

And now for the biggest transformation…  These single purpose services (and microservices) are now deployed into Container based environments. Containers are rapidly replacing server virtualization because they enable fluid and rapid deployment of code and middleware. They are a more natural fit to the Agile and Continuous Delivery mindset. Docker is currently the most popular container model, and Kubernetes is leading as the most popular orchestration tool.

The need for 1-second resolution actually goes beyond monitoring. DevOps teams require immediate feedback whenever a change to an application is deployed, especially because of new container allocations. Typical questions DevOps teams face include:

  • Which microservice is running where?
  • Was the latest allocation “successful”?
  • Did the latest allocation impact service quality?
  • If quality was impacted, what is the root cause?
  • Is my compute infrastructure over-allocated (endangering performance) or under- allocated (wasting money)?

With the amount of business flowing across applications, these questions must be answered in seconds, not minutes.

instana-process-00instana-process-01instana-process-02instana-process-03instana-process-04instana-process-05instana-process-06instana-process-07instana-process-08instana-process-09instana-process-10instana-process-11instana-process-12instana-process-13

Real World Examples

The digitized business has truly evolved and now requires high performance and reliable applications to maintain a competitive advantage.  The modern applications supporting the business have substantially more scale, components and dynamism.  Application shelf life has decreased, and turn-over of the containers, processes and services has increased. Many new software services are lightweight single-purpose entities such as Kafka or Redis or Nginx. All these software components are built to scale and easily run in containers.

Here are a few container-oriented examples our customers have experienced:

Customer 1

  • The customer has 110 hosts running more than 3300 Docker Container instances.
  • Each Container has 15 additional attributes.  The result is almost 50,000 additional attributes with which to contend.
  • In another environment, in a single 24 hour period, there were in excess of 16,000 Events

Customer 2

  • Another customer has both short and long running containers running combinations of the following: .NET, Kafka, Postgres, Nginx, Node.JS, Java SpringBoot, SQL Server, MYSQL, Redis, HAProxy, GoLang and Apache Tomcat
  • Many teams are doing 1 – 5 container allocations per day.

docker stack

Instana delivers immediate understanding of newly allocated containers and their impact on service quality.

Already today, business critical applications are relying on container technology to enable rapid and fluid continuous delivery devops processes. This will inevitably drive the demand for higher accuracy, precision and timeliness (1 second resolution) in the monitoring tools managing these critical apps. Try Instana today to experience the benefits and power of our Dynamic APM solution. Every metric we collect is at 1 second resolution!