Every Monitoring Tool is Now Offering Observability
You can hardly find a product in the cloud-native ecosystem today that doesn’t claim to solve some observability problem. These claims rely either directly or indirectly, through one or a combination of the following tools: logging, metrics, traces, security, on all parts of the software-defined stack – including, but not limited to, storage for big data systems.
While the above research paper is a bit of an extreme case, and I’m not sure if any products address that specific use-case (yet), it’s indicative of a quasi-renaissance in something we, as an industry, have been doing for a long time. Observing complex systems isn’t new, it’s based on control theory which was devised in the 1960s to explain the relationship between measuring the inputs and outputs of a system and having the information necessary to implicitly control it. I had a conversation recently with a colleague where he made the connection between how much we can observe a system to how well we are able to effectively manage it.
This is something I hadn’t fully considered, as I was mostly concerned with how observable a system was, and the means by which this was achieved – but that’s not the important bit. I couldn’t see the forest for the trees. We can collect all the logs, traces, metrics, and anything else – but that doesn’t matter if we can’t understand the data we’ve collected. If we don’t have an understanding we subsequently will be unable to control the systems we’re observing. This issue is large, and my colleague wrote a compelling article on the problem of attention deficiency with monitoring modern systems. This is a bit of a chicken and egg problem, honestly. It’s an evolving process where we must constantly separate the signal from the noise. It’s not enough to just collect these data points, we have to aggregate, summarize, and create interactive tools to help us reconstruct the data to empower us to react.
Separating the Signal From the Noise
Modern distributed systems create a bunch of signals, dozens or hundreds of microservices communicating thousands of times per second create a treasure trove of highly valuable data that should help you solve problems when things start to degrade or break down. The easier it is to observe and correlate information across these boundaries, the easier it is to control that which you’re observing.
Some might say that this is where AI will step in, I believe that’s a lazy answer. AI can’t make decisions for the builders of these highly organic, unpredictable, and complex systems. Machine Learning can help us analyze the data and push back some of the noise, but realistically AI will always have the potential to make the wrong decision – we aren’t dealing with simple models – every application and its dependencies, behaviors, and idiosyncrasies are bespoke. If someone is telling you that AI is going to solve your performance or operational problems – they’re selling you snake oil.
At the end of the day, only you can decide if your system is observable or not through a basic litmus test: do you have the information required to comprehend your environment in real-time and is the information useful enough to understand whether or not your business is operating effectively? If we take that one layer deeper, do you have the information you need to quickly rectify any abnormality?
Many of our customers have gone from solving problems in hours or days, to minutes – and while Instana may not be the only reason for that (a combination of process improvements, CI/CD and DevOps culture are often at play here), I’d like to think that we were a part of it. Well, I don’t have to think about it, they flat out tell us we are. You can see some examples below of the benefits observability has had with our customers.
If you want your organization to benefit from modern APM and Observability then you can see for yourself and play with Instana right now. So, what are you waiting for?