Monitoring Kubernetes is an important part of a holistic management strategy considering the impact that orchestration can have on the performance and availability of application services.
What is Kubernetes?
Kubernetes (commonly k8s) is an open-source container-orchestration system for automating computer application deployment, scaling, and management. It aims to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts”. (Wikipedia)
Essentially, Kubernetes manages the available machine resources in the cluster, as well as the actual services and applications deployed into it. Below the visible layer, Kubernetes, provides a virtual networking layer and presents the services with a fully isolated network and DNS service. In addition to the internal services provided, k8s is equipped with multiple extension interfaces, to implement automatic proxying, service discovery, health checks, redistribution, and more.
Originally developed by Google, Kubernetes distributions are provided by a multitude of service providers, such as Google itself (GKE – Google Kubernetes Engine), Amazon AWS (EKS – Elastic Kubernetes Service), Microsoft Azure (AKS – Azure Kubernetes Service), and many more. Additionally, it is possible to run k8s completely on-prem on dedicated machines or inside a private cloud environment.
Read more: Kubernetes vs Docker: What’s the Difference?
Kubernetes Environment Monitoring Best Practices
Monitoring a Kubernetes environment requires you to think about multiple aspects. First, health and performance metrics (KPIs) are important indicators to the overall state of the k8s environment. By themselves, however, they are not enough to understand the overall impact on application and request performance. Given the fact that k8s manages resources, an issue in the way k8s operates services can cause problems throughout the application landscape of all deployed services.
Kubernetes Metrics: Kubernetes provides a good number of meaningful metrics, which are commonly sent to a Prometheus monitoring service. They are also visible from the Kubernetes Dashboard that is often deployed into the k8s cluster itself. The common metrics are implemented directly into the internal Kubernetes services with custom metrics collectors and must be gathered by specific services inside the Kuberentes cluster; a popular option is using the Prometheus Operator.
kube-state-metrics is a service that listens to the Kubernetes API server and generates metrics about the state of the objects. It generates metrics from Kubernetes API objects without modification. As a result, in certain situations kube-state-metrics may not show the exact same values as kubectl, since kubectl applies heuristics to the data. Prometheus get’s it’s metrics from the kube-state-metrics service.
Kubernetes metrics-server is used for auto-scaling the Kubernetes environment. It collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through the Metrics API. It is advised to not use the Metrics Server to forward metrics to monitoring solutions, or as a source of monitoring solution metrics.
Screenshot of the Kubernetes UI Dashboard taken from the Kubernetes documentation
Apart from standard performance metrics, k8s also provides statistics on resources, such as CPU Allocation (requested, used, capacity), or Pod Capacity, providing essential insight into capacity planning and cluster scaling.
Deployment Metrics: Kubernetes deployments are used to define the desired state for your Replica Sets and individual Pods. A big chunk of metrics revolves around deployment statistics of the services and applications in control of the Kubernetes cluster. These metrics include health information per service, but more importantly deployment changes, number of pod restarts, and crash loops.
Events: Kubernetes events provide insight into what is happening inside a cluster. Events are API objects and they are stored in the apiserver on the K8s master. Events can provide important information such as; what decisions were made by the scheduler or why some pods were evicted from the node. The Kubernetes retention policy is as follows – events are removed one hour after occurrence. Therefore, in order to save these events for a longer period of time (as you would want when performing root cause analysis) you must store these events in a third party solution.
Monitoring these values is a best practice since k8s manages service restarts and failures automatically. If the number rises, though, Kubernetes will delay restart retries based on a back-off algorithm, or even stop retrying at all. Finding and fixing those issues quickly is important but can be difficult.
Kubernetes and Microservices Architectures
Accessing internal metrics using open-source (OSS) monitoring tools is useful, but a lot of work is left up to the user when trying to understand the actual impact of the Kubernetes cluster on the running services. Furthermore, the infrastructure to collect, store, and analyze the metrics (often Prometheus) must be set up and managed. Prometheus, while good, was not designed to scale out for monitoring at large scale so be prepared to manage multiple instances of your Prometheus servers over time.
Complicating matters further, correlation of metrics from k8s with data (like metrics and distributed traces) from the microservices is noticeably missing when using the standard OSS tools. As a result the user is left with a set of disconnected metrics, typically spread across disparate monitoring systems. Piecing together the jigsaw puzzle and trying to connect to the dots during an outage situation, as well as getting to the root cause, is an unnecessarily complicated and lengthy process when using OSS tools.
Real-Time Kubernetes Environment Monitoring Using Instana
As shown in the Instana Kubernetes Documentation – Instana automatically discovers and monitors Kubernetes:
- Clusters
- Nodes
- Namespaces
- Deployments
- Services
- Pods
Instana has fully automated the installation of the monitoring agent into Kubernetes. By running a single Helm install command, the Instana agent is deployed as a DaemonSet into the Kubernetes cluster, and starts collecting information right away. The agent is automatically added to all schedule-able nodes in the cluster.
After deploying the Instana Agent, the configuration is completely automatic. The process for deploying the Instana Agent into the various Kubernetes distributions depends on the service provider. The Instana installation wizard provides a selection of k8s service providers to choose from, pre-populated with the required information.
Screenshot of Instana agent installation process for Microsoft AKS
It’s important to understand that Instana not only monitors the given metrics in real time, provided by Kubernetes itself, (as discussed earlier) but also monitors all the services being deployed into k8s, internal ones and custom business services. This includes end-to-end distributed tracing of every request flowing through all services.
After the Instana agent starts, every container in the Kubernetes cluster is scanned for supported technologies, automatically set up to be monitored, and added to Instana. The single agent implementation keeps the monitoring overhead extremely low and greatly simplifies the overall installation and maintenance process.
The benefit of using Instana is that all data, from Kubernetes, from the services, from sidecars, or even the physical hosts, are stitched together by Instana to provide a full, end to end view of the contextual dependencies and impact between the different components. This process only takes a few seconds, it doesn’t get much more real time than that.
Screenshot from Instana showing aggregate details for a Kubernetes namespace
With Instana there is no need to manually determine which performance issue in a service may be related to a resource contention issue in the cluster. Instana has all of the required context to understand the root cause of problems at any layer of the stack. For example, Instana can determine that “multiple services on the same Kubernetes node have performance issues at the same time due to over-commitment of CPU”.
Screenshot from Instana showing all Kubernetes dependencies for a single business service “discount”
Screenshot from Instana showing all events at the Kubernetes cluster level
Just as with every other technology monitored by Instana, Kubernetes environment monitoring includes automatic and continuous discovery, dependency mapping, metrics monitoring, anomaly detection, and filter based analytics across the whole system stack. See Instana Kubernetes Monitoring in action by using our interactive sandbox observability environment today.