Summary: How Instana engineering thinks about various observability technologies such as logging, tracing, and metrics in delivering a more effective monitoring solution and experience to customers and users, respectively.
When thinking about observability, it might be better to think about the various technologies in terms of a hierarchy of observability models as opposed to foundational pillars.
In an ideal world, monitoring and managing a complex system of services would entail only focusing on the behavioral signals emitted and the states inferred by such. Unfortunately, today that simplified reality is challenged by the accelerating rate of change, the lack of newer observability models to capture higher-level semantics and a temporary and somewhat unhealthy fixation on big but far too simple data collection mechanisms and backend storage systems.
At Instana, we recognize that all sources and models of observability have a place in the monitoring stack and that some are far more useful than others depending on the situation and the task at hand. Those responsible for availability monitoring need fast and efficient assessment of the situation – current and predicted. When troubleshooting an incident, it is vital to be able to drill down to traces and maybe even logs from the appropriate location identified at higher layers in the hierarchy.
In terms of cognitive effort and computing cost, signaling is the cheapest form of communication and possibly control. With the right semantic signal set, siganlling it is far more effective than metrics. Metrics are useful in calling out shifts and phases in system dynamics and from there, offering a helpful starting point for explorative analysis and search of logs and traces.
Instana has spent a significant amount of engineering effort making all supported observability technologies scale and perform well in resource-constrained environments, but the relative costing as indicated by the hierarchy will remain as is irrespective.
Much like what has happened in the web search space, we believe the future for effective application monitoring is centered around suggestions rather than search. Searching traces and logs requires operations staff to know already what they are looking for. It is limited by their knowledge of an application stack that is changing far too much for the knowledge to always be sufficient and accurate. Suggestions, based around a smaller value domain, can be far more useful, situated, and immediate in terms of understanding and intelligent (re)action.
Much like hardware memory systems, the goal at Instana is to keep user attention and action as high as possible in the observability hierarchy. Drilling down into logging and tracing is a last resort because of the expense it represents to human cognition and importantly time to react which as our name implies, we aim to make a near-instant. Less is more effective.
One obvious way to somewhat mitigate shortcomings at the bottom layers of the hierarchy is to push upwards derived values such as metrics and signals. But this is not without challenges and compromises as Mark Burgess has detailed in a masterfully researched paper – From Observability to Significance in Distributed Information Systems.
Unlike other data-oriented solution stacks, the pipeline of the Instana solution is focused on generating higher models as early as possible and with as little as possible manual configuration. The solution imbues each stage in the pipeline with smart and efficient strategies of data to signal transformation as opposed to other solutions that require the creation and ongoing expensive management of hundreds of logging to metric mappings.
These higher-level models are not only important in addressing the ever-increasing complexity in systems; they are the only realistic approach to generating useful predictions. Prediction enables management – the next step beyond passive monitoring.
Logging and tracing are still too far removed from the aspects of a system that make learning, understanding, and reasoning scalable for humans at a system level. Humans need far better operative models, and these cannot be mapped to the low-level observability models. It is from the mining of these lower layers that other models of real benefit to customers surface. We cannot effectively manage the ecosystem of a forest if all we can see are the (call) trees.
Observability is a means to an end – that end is a solution, like Instana, that entails efficient monitoring, adaptive controllability, and from there effective availability management.
Like most other engineering and design disciplines, Instana engineering takes both a bottom-up and top-down approach to the solution. Tracing and logging give rise to events of interest that still need to be conveyed by the monitor to the (human) operator. Metrics place the events within a situation and environment. Signals are what are extracted and abstracted further, finally reaching an assessment of the state of play of a system, service, or endpoint.
The future needs of customers are likely to be best served in designing and developing newer technologies at both ends of the observability hierarchy. There needs to be a shift away from data naively mapped to models and reports. Lower forms of data should be able to reconstruct a past reality and in turn, much of the lower models we capture today at the source. Higher forms of communication and coordination need to rise at the other end in order to throw back the curtain that is the illusion of control and more optimally manage systems and networks of services.
Simplify. Signify. Rectify.