Welcome to the first ever blog posting from Instana! We founded Instana because we noticed that the world of system and application performance managment was too complicated. Our first blog tells this story from an historical perspective and gives an outlook of what Instana will be. As always, we welcome your comments!
Short history of Application Performance Management (APM)
The first generation of Application Performance Management (APM 1.0) was established at the end of the last century by vendors like Wily (acquired by CA), Quest Software (acquired by Dell), Precise (acquired by Symantec) and Mercury Interactive (acquired by HP). These tools collected statistics and metrics out of system and application components and provided a first generation of user-defined transaction profiling using instrumentation technology. APM 1.0 required a lot of effort to implement and maintain, but helped to monitor and analyze the performance of newly developed 3-tier applications.
The second generation of Application Performance Management (APM 2.0) was driven by more distributed service-oriented architectures (SOA) and the rise of more elastic virtualized or cloud based infrastructure. This generation added the ability to discover application topologies and to analyze distributed applications. It also adopted to the increasing number of code changes by agile development and continuous delivery by providing an automated instrumentation mechanism and baselining to decrease the number of thresholds the user has to define manually. User experience monitoring of web and mobile apps has become part of APM 2.0 as more and more businesses rely on the service quality of their applications. AppDynamics and dynaTrace are, in our point of view, the leading products in this space while NewRelic took APM 1.0 into the cloud providing APM as-a-service and making it very easy to use without providing the technical depth of the APM 2.0 competitors.
APM 2.0 is primarily application performance monitoring (vs. management) and you must have experts to understand the data as all of these tools use dashboarding and reporting of data as their core. This data has to be analyzed by humans to understand the problem and derive actions. Automation or Management of the application directly from an APM tool is still very low.
So the interesting question is: How will APM evolve?
We took the approach to look at the edges of IT: Web-Scale architectures and the Long-Tail of applications to find out where the future of APM is.
According to Gartner, by 2017 more than 50% of global enterprises will be using Web-Scale architectures.
"Large cloud services providers such as Amazon, Google, Facebook, etc., are reinventing the way in which IT services can be delivered," said Cameron Haight, research vice president at Gartner. "Their capabilities go beyond scale in terms of sheer size to also include scale as it pertains to
speed and agility. If enterprises want to keeppace, they need to emulate the architectures, processes and practices of these exemplary cloud providers."
Netflix is one of the prominent Web-Scale businesses that went all-in to cloud and microservice architectures. A blog entry of Netflix says about APM:
“And at our scale, humans cannot continuously monitor the status of all of our systems. To maintain high availability across such a complicated system, and to helpus continuously improve the experience for our customers, it is critical for us to have exceptional tools coupled with intelligent analysis to proactively detect and communicate system faults and identify areas of improvement.”
NetFlix announced in this blog that they needed to developtheir own monitoring toolset based on real-time streaming technology. Since then they have Open Sourced some of their technologies like Vector and Atlas.
Adrian Cockcroft, a former Cloud Architect at Netflix and known authority in the cloud and monitoring space, has defined six rules for modern monitoring tools which reflect the change in architectures driven by Web-Scale IT:
- Spend more time working on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics.
- Metric display latency needs to be less than human attention span (~10s)
- Validate that your measurement system has enough accuracy and precision. Collect histograms of response time.
- Monitoring systems need to be more available and scalable than the system being monitored.
- Optimize for distribution, ephemeral, cloud native, containerized microservices.
- Fit metrics to models to understand relationships.
Adrian also developed a simulator called Spigo that can simulate complex environment - the simulation data can be used to stress test the capabilities of monitoring tools in terms of visualization, scale and intelligence.
Source: https://github.com/adrianco/spigo - A visualisation of a NetflixOSS architecture in multiple Amazon regions.
We think that new architecture trends like microservice architectures and message driven architecture alongside the rising non-functional requirements of resilience, redundancy, availability and load will make the number of applications that are of a similar “scale” and complexity as Netflix much higher. A small company that has ten services today will eventually evolve and need to operate some hundreds of container based services.
Ben Horrowitz has also written an interesting blog about this topic called “Past and Future of System Management” where he sees different reasons that a new generation of monitoring tools is needed "for modern, massive, cloud-based architectures" target="_blank"
The Innovators Dilemma
The key to innovation can be found in Clayton Christensen's book “The Innovator’s Dilemma” about disruptive products:
“Empowering innovations transform complicated, costly products that previously had been available only to a few people, into simpler, cheaper products available to many. Empowering innovations create jobs for people who build, distribute, sell and service these products.”
I've done many APM implementaions and performance troubleshootings in my career and the complexity of tools is still high - especially because you have to understand the data and often combine it with some more detailed raw data retrieved from the operating system or log files. At the end of the day you have to be an expert to use these tools and most companies do not have specialized performance experts.
While talking to small and medium size businesses recently while researching the long tail of APM, we also found that many of these businesses cannot affort APM tools because they are too expensive.
With this experience I can agree with the Christensen conclusion that disruption comes by making a product simpler and cheaper, and that the APM market seems to be ready for this disruption.
So the next generation of APM must be a product that is not built for experts but for all the operators, developers and IT managers that need to assure the health of their systems and applications. The key to the next generation APM tool therefore is in the Long Tail of APM. If you can address the millions of IT people and applications that currently are not monitored, you also have the key to the most complex applications like Christensen explains in his book. I will publish an article on the Long Tail of APM in the next days to explain this market in more detail.
Machine driven analytics
Making APM easier than it is today can only be achieved by reducing the amount of human expertise needed to analyse monitoring data.
A recent TechCrunch article says: “The Next Wave Of Enterprise Software Powered By Machine Learning“ - and even if it was not talking about APM, I believe that the future of APM will be driven by machine learning.
Or as Adrian says in his first rule: “Spend more time working on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics.”
At Instana, we are working on such approaches and technologies with a highly skilled and talented team.
Source: Instana.com - Demo of 3D system landscape.
Our Alpha version is a first implementation of our technology to provide a new level of intelligence to system monitoring for cloud and container based infrastructures. This is the foundation for our idea of making APM easy enough so that the Long Tail can use it but also experts who need to understand Web-Scale architectures.
today and register for E-Mail updates to get information about our Beta programs that will be starting soon.