If not already then very soon. Your new applications are almost all shifting to using some degree of cloud, containers, and microservices within their architectures. I get to work with a lot of companies from tiny startups to massive fortune 10’s and they are all shifting to modern architectures and very rapid release cycles in order to be competitive. The only problem is that the shift is happening faster than the production tooling can adapt.
There are 9 new issues impacting your ability to publish high quality applications. Why are they new? Because scale, pace, and complexity have reached an all time high due to new methodologies and technologies like microservices, Docker, Kubernetes, etc. Companies now have 1000’s of microservices replacing their 3 tier applications. They are once again densely packing hosts but this time using container technologies instead of VM’s. They are giving deployment parameters to orchestration engines and letting the orchestrators manage the implementations. They are deploying hundreds of times a day, a practice that makes ITIL practitioners minds explode.
So what are these mysterious 9 new issues?
- It takes too long or is impossible to deploy monitoring to the full application stack
- Re-configuring monitoring takes too long when you deploy frequently
- Data is not granular enough to accurately detect issues in dynamic environments
- Data from mission critical components is not monitored or not correlated causing lack of visibility into impact
- Abstraction techniques such as cloud, container, and orchestration technologies creates a lack of context making performance optimization and determination of root cause a challenge
- Inflexible data models in monitoring tools make it impossible to understand impact and causality when using containers, orchestration, serverless, etc.
- Alerts take too long to trigger making impact to business too costly
- Performance analysis expertise for new technologies is hard to find making it difficult to troubleshoot problems
- Monitoring data exists in too many silo’s causing inefficiency and errors as the IT organization troubleshoots and optimizes their applications.
Now that you know what these 9 new issues are, how should you deal with them? This blog post is the first in a multi-part series where we will explore the answers. Let’s get started right away with issues #1 and #2.
Deployment and Configuration Hell
Monitoring tools have gotten easier to deploy and configure over the years but it’s still surprising how much time and energy is required to deploy and configure in even small environments. Let’s look at an example of a small container based application that consists of 5 hosts and 10 services. For each host there are multiple infrastructure components to monitor (OS, process, VM, network, container, etc.). We’ll go with a low estimate of 5 different infrastructure monitors per host. Of course these hosts exist so that you can run application components. In our example we have 10 different services spread across these 5 hosts. Each service can contain multiple components as well such as application server, message queue/bus, cache, database, web server, etc. We’ll assume 4 application components per service for this example. So let’s do the math (or maths for my British friends).
5 hosts X 5 infra. components/host = 25 infra components
10 services X 4 app components per service = 40 components
That’s a total of 65 components that need monitoring deployed and configured for our small 5 host application. That doesn’t even take into account specialized cloud computing or orchestration monitoring. This also assumes your environment is static which is probably a false assumption.
Now imagine this is a dynamic environment that you are updating multiple times a day and that also dynamically expands and contracts with workload demands. It becomes a full time job trying to keep your monitoring configuration reasonably accurate. It’s basically an impossible task and leaves you exposed to the risk of business impact without knowing until customers complain.
Automatic, Continuous Discovery and Mapping
The solution to this problem is to have your monitoring tool continuously discover everything about your infrastructure and applications and adjust itself automatically to your changing environment. This is exceptionally difficult to accomplish in reality, especially as the rate of change and scale of your applications increases, but Instana was built with this capability in mind and has figured out how to continuously discover and adapt to your changing environment.
The image above depicts automatic discovery of a linux host running docker, a java process, the JVM inside the process, and finally Tomcat on the JVM.
This capability is actually one of the core pillars of Instana’s agent design. It is constantly discovering any changes to your environment and adapting itself accordingly. It does this without requiring any human intervention and it all happens in real time. This is an absolute requirement for effectively monitoring any cloud, container, microservices environment and solves issues #1 and #2 described above.
In my next blog post I’ll discuss issue #3 which deals with lack of data granularity and sampling. This is a major issue that can lead to false results when troubleshooting or wrong conclusions by AI systems.