Detect certificate problems
Here is an example of a typical day as an SRE from last Monday. After the handover in the morning with our team in Australia, developers started complaining about one of our test environments not working. We monitor our test and preview environments with Instana, so I jumped into the UI and checked all services using the application perspective view.
Looking at the dashboard it was clear that something started breaking at around 9:30am.
I checked “Top Services” and selected “Erroneous Calls” and immediately found the component that was causing the errors. In this case it was our “ui-client” component.
I used “Analyze Calls” and looked at the “Erroneous Calls” to identify the problem. In this instance I am especially interested in Stack Traces and Errors that are shown in the details section.
Looking at the error in the Stack Trace was enough to find the root cause. An invalid certificate was installed in one of the test environments.
Error: Hostname/IP does not match certificate's altnames
In the end identifying the problem with Instana and fixing it only took a few minutes. Being able to see all Calls and Traces, drilling down to the details like Errors and Stack Traces, makes Instana the perfect monitoring solution for the complex environments we have to deal with every day. After we fixed the issue I could start working on tasks I had initially planned for the day 🙂
PS: Today we configured cert-manager to automatically update our certificates for all our Kubernetes test environments. But thats a story for another post.