Monitoring Kubernetes
TABLE OF CONTENTS
Supported Versions
- 1.9.x - 1.18.x
Supported Managed Kubernetes
- Amazon Elastic Container Service for Kubernetes (EKS)
- Azure Kubernetes Service (AKS)
- Google Kubernetes Engine (GKE)
- VMware Tanzu Kubernetes Grid (formerly known as Pivotal Container Service, or PKS): 1.5 and above
Supported Service Meshes
- Istio: 1.2.x to 1.7.x
Deprecations
- We have deprecated support for the
extensions/v1beta1
andapps/v1beta2
API versions for DaemonSet, Deployment, and ReplicaSet in our Kubernetes Sensor, following the announcement from Kubernetes that these deprecated API versions will soon be removed in Kubernetes v1.16. Please follow the latest installation documentation for Kubernetes or OpenShift and update to our latest Helm chart, agent YAML or operator.
Installing the Instana Agent in Kubernetes
The Agent Setup for Kubernetes describes how to install the Instana agent into your cluster.
The installation of Instana agents on VMware Tanzu Kubernetes Grid is fully automated by the Instana Microservices Application Monitoring for VMware Tanzu tile.
Accessing Kubernetes Information
Once the agent has been deployed to your cluster, the Kubernetes sensor will report detailed data about the cluster and the resources deployed into it.
Instana automatically discovers and monitors Kubernetes:
- Clusters
- Nodes
- Namespaces
- Deployments
- DaemonSets
- StatefulSets
- Services
- Pods
Kubernetes information is easily accessible and deeply integrated in all aspects of your application.
Kubernetes Menu Item
First and foremost Kubernetes is a top level item in the menu. This gives you direct access to your Kubernetes clusters and namespaces.
From Application Perspectives
Kubernetes information is also accessible from within all your application perspectives or services. If a service is running on a Kubernetes cluster, the respective context information is shown in the "Infrastructure" tab:
For containers the pod and namespace, and for hosts the cluster and node are shown and directly linked.
From Infrastructure
In the Infrastructure map you will see Kubernetes information in the sidebar for either the host or the container you have selected.
You can use Dynamic Focus to filter the data. For example search for a specific deployment in a cluster. Additionally, the keywords entity.kubernetes.cluster.distribution
and entity.kubernetes.cluster.managedBy
enable searching for a Kubernetes cluster by distribution and management layer. Supported values for entity.kubernetes.cluster.distribution
are gke
, eks
, openshift
and kubernetes
. Supported values for entity.kubernetes.cluster.managedBy
are rancher
and none
.
Kubernetes Dashboards
Kubernetes dashboards present all information needed for a given Kubernetes entity. The context is always accessible via the context path at the top. In the following screenshot we are seeing a Namespace named "robot-shop" in a cluster called "will-k8s-cluster".
The different dashboards are always structured in the same way:
- Summary shows the most relevant information for a given entity. This starts with a status line which shows the current status and related information like age. In the next section CPU, Memory, and Pod information are shown. This gives you information of the consumed resources including the pods. Sections below like "Top Deployments" and "Top Pods" in this screenshot show potential hotspots which you might want to have a look at.
- Details shows detailed information like "labels", "annotation", and the "spec".
- Events shows all relevant Kubernetes events and links them to the respective dashboards.
- Related Entities like "Deployments", "K8s Services" and "Pods" are shown in the following tabs. What is shown depends on the entity you have selected.
CPU and Memory Usage
For Kubernetes pods, deployments, services, namespaces and nodes it is possible to see an aggregated view of the current CPU and Memory usage as it compares to the CPU and Memory limits and requests set for these resources.
If available, the usage information is calculated from data gathered from the container runtime that is executing the containers that make up the given resource.
Analyze Kubernetes Calls
Unbounded Analytics gives you powerful tools to slice and dice every single call in your Kubernetes cluster. If you click the button "Analyze Calls" from a Kubernetes dashboard the appropriate filter and grouping is already set. In this case we are seeing all calls in the "robot-shop" namespace grouped by pods:
Linking Kubernetes Services and Logical Services
Single Kubernetes service to multiple logical services
Multiple logical services can be related to a single Kubernetes service when the service mapping rules match up, and there are calls generated on that Kubernetes service. For example, a Kubernetes service with the label selector "service=my-service"
may contain pods that have the additional labels "env=dev"
and "env=staging"
— combined with a custom service mapping configuration in Instana with the following tags kubernetes.container.name
and kubernetes.pod.label, key: env
, results in multiple logical services linked to that single Kubernetes service and is displayed on the Kubernetes Service dashboard.
Single logical service to multiple Kubernetes services
Multiple Kubernetes services can be related to a single logical service when those Kubernetes services are destroyed and recreated over time. For example, if the Kubernetes service shop-service-a
with generated calls is replaced over time with shop-service-b
with generated calls, both services are displayed on the logical service dashboard when the selected period of time overlapped when the calls were generated.
Sensor Data Collection
Instana collects information about the Kubernetes Cluster, Nodes, Namespaces, Deployments, K8s Services, and Pods.
-
Cluster
- KPIs
- Node Count
- Pods Allocation (Allocated Pods / Pods Capacity ratio)
- CPU Requests Allocation (CPU Requests / CPU Capacity ratio)
- CPU Limits Allocation (CPU Limits / CPU Capacity ratio)
- Memory Requests Allocation (Memory Requests / Memory Capacity ratio)
- Memory Limits Allocation (Memory Limits / Memory Capacity ratio)
- CPU Resources
- CPU Requests (aggregated cpu requests of all running containers)
- CPU Limits (aggregated cpu limits of all running containers)
- CPU Capacity (aggregated cpu capacity of all nodes)
- Memory Resources
- Memory Requests (aggregated memory requests of all running containers)
- Memory Limits (aggregated memory limits of all running containers)
- Memory Capacity (aggregated memory capacity of all nodes)
- Pods (aggregated on whole cluster)
- Running Pods
- Pending Pods
- Allocated Pods
- Pods Capacity
- Replicas (aggregated from all deployments)
- Available Replicas
- Desired Replicas
- Node list with KPIs
- Deployment list with KPIs
- Component Statuses
-
Node
- KPIs
- Pods Allocation (Allocated Pods / Pods Capacity ratio)
- CPU Requests Allocation (CPU Requests / CPU Capacity ratio)
- CPU Limits Allocation (CPU Limits / CPU Capacity ratio)
- Memory Requests Allocation (Memory Requests / Memory Capacity ratio)
- Memory Limits Allocation (Memory Limits / Memory Capacity ratio)
- CPU Resources
- CPU Requests (aggregated cpu requests of all running containers on this node)
- CPU Limits (aggregated cpu limits of all running containers on this node)
- CPU Capacity
- Memory Resources
- Memory Requests (aggregated memory requests of all running containers on this node)
- Memory Limits (aggregated memory limits of all running containers on this node)
- Memory Capacity
- Pods Allocation
- Allocated Pods (running pods on this node)
- Pods Capacity
- Conditions
- Labels
- Pods list
-
Namespace
- KPIs
- CPU Requests Allocation (CPU Requests / CPU Capacity ratio)
- CPU Limits Allocation (CPU Limits / CPU Capacity ratio)
- Memory Requests Allocation (Memory Requests / Memory Capacity ratio)
- Memory Limits Allocation (Memory Limits / Memory Capacity ratio)
- Pods Allocation (Allocated Pods / Pods Capacity ratio)
- Status
- Deployments list
- Deployment configs list
-
ResourceQuota
- Hard & Used
- CPU Requests
- CPU Limits
- Memory Requests
- Memory Limits
- Pods
-
Deployment
- Conditions
- Labels
- CPU Resources
- CPU Requests (aggregated cpu requests of all running containers of this deployment)
- CPU Limits (aggregated cpu limits of all running containers of this deployment)
- Memory Resources
- Memory Requests (aggregated memory requests of all running containers of this deployment)
- Memory Limits (aggregated memory limits of all running containers of this deployment)
- Pods
- Available vs Desired Pods
- Pending vs Unscheduled vs Unready Pods
- Pending phase duration (in most cases can be interpreted as rollout duration)
-
K8s Service
- Type
- Location
- Cluster IP & External IP
- CPU Requests, Limits
- Memory Requests, Limits
- Endpoints List
- Ports List
-
Pod
- KPIs
- Phase
- Restarts (aggregated on all containers of this pod)
- CPU Requests (aggregated on all containers of this pod)
- CPU Limits (aggregated on all containers of this pod)
- Memory Requests (aggregated on all containers of this pod)
- Memory Limits (aggregated on all containers of this pod)
- Conditions
- Labels
- Container list (State, Restarts)
Health Rules
Built-in
There are a couple of built-in health rules that will trigger an issue for Kubernetes entities
-
Cluster
- Kubernetes reports a Master-Component (api-server, scheduler, controller manager) is unhealthy. Note that due to a bug in Kubernetes the health is not always reported reliably. We try to filter these out, not causing an alert but only showing up on the Cluster detail page.
-
Node
- Requested CPU is approaching max capacity (requested CPU / CPU capacity ratio is greater than 80%).
- Requested Memory is approaching max capacity (requested memory / memory capacity ratio is greater than 80%).
- Allocated pods are approaching maximum capacity (allocated pods / pods capacity ratio is greater than 80%). For a node pods in the phases 'Running' and 'Unknown' are counted as allocated. See Kubernetes docs for details on node capacity.
- Node reports a condition which is not ready for more than one minute. For a node that's all conditions besides the Ready condition. See Kubernetes docs for details on all node conditions.
-
Namespace
- Requested CPU is approaching max capacity (requested CPU / CPU capacity ratio is greater than 80%).
- Requested Memory is approaching max capacity (requested memory / memory capacity ratio is greater than 80%)
- Allocated pods are approaching maximum capacity (allocated pods / pods capacity ratio is greater than 80%). For a namespace pods in the phases 'Pending', 'Running', and 'Unknown' are counted as allocated. The namespace capacity values are based on ResourceQuotas which can be set per Namespace. See Kubernetes docs for details.
-
Deployment
- Available replicas less than desired replicas.
-
Pod
- A pod is not ready for more than one minute, and the reason is not that it's completed. (PodCondition=Ready, Status=False, Reason != PodCompleted). See Kubernetes docs for details on all pod conditions.
Custom
In addition to the built-in rules, you can also create custom rules on metrics of a cluster, namespace, deployment, and pod. E.g. if the threshold for node capacity warnings is too high you can disable them and create a custom rule with a lower threshold. See Events & Incidents configuration for details.
Service Meshes
Istio
The default installation should work out of the box with Instana. If however you deploy Istio with a default deny policy (mode: REGISTRY_ONLY
). To work effectively with this configuration it is necessary to enable Instana's service mesh by-pass. This can be enabled with the following agent configuration:
com.instana.container:
serviceMesh:
enableServiceMeshBypass: true
Debugging the Mesh By-pass
There are a couple of steps that can be taken to debug the service mesh by-pass.
- verify it is enabled.
- verify the iptable rules are applied to the container.
Verify Enabled
To verify the service mesh by-pass is enabled you can check in the Instana Agent logs with the following command:
kubectl logs -l app.kubernetes.io/instance=instana-agent -n instana-agent -c instana-agent
If it is enabled you should find a log line that looks similar to the following which indicates how many containers have been detected by the agent:
2020-03-06T21:32:08.269+00:00 | INFO | nstana-sensor-scheduler-thread-3 | rviceMeshSupport | com.instana.agent - 1.1.541 | Applying service mesh by-pass to 9 containers
Verify iptable Rules
The easiest way to verify the iptable rules is to shell into the instana agent and listing the target containers iptables rules as follows:
kubectl -n instana-agent exec -it ${INSTANA_AGENT_POD} -c instana-agent -- /bin/bash
nsenter -n -t ${PID} iptables -t nat -L INSTANA_OUTPUT
If the chains have been applied the command should have an output similar to the following:
Chain INSTANA_OUTPUT (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere 10.36.3.1
ACCEPT tcp -- anywhere 169.254.123.1
ACCEPT tcp -- anywhere <your_cluster_dns_name>
To support bi-directional communication between the instana agent and your JVM processes also check the following:
nsenter -n -t 1265857 iptables -t nat -L INSTANA_INBOUND
with a result similar to this:
Chain INSTANA_INBOUND (1 references)
target prot opt source destination
ACCEPT tcp -- 10.36.3.1 10.36.3.28
Depending on when the rules were applied it can take a few minutes for the process to be instrumented and data to be visible in Instana's dashboards.
Notes
Using a GKE provided containerd node image
Instana does currently not support monitoring GKE provided containerd based images (cos_containerd
or ubuntu_containerd
).
Troubleshooting
Why am I not seeing any Kubernetes clusters or namespaces?
If there are no clusters or namespaces listed on the Kubernetes page, either no cluster is actively being monitored due to an agent not being installed, or no cluster was monitored during your selected timeframe.
Click Live to check for any clusters and namespaces in live mode, and if none are listed, see our Install Kubernetes section.
Missing ClusterRole permissions
Monitoring issue type: kubernetes_missing_permissions
The Instana Agent requires the appropriate ClusterRole permissions for specific resources to be able to monitor a Kubernetes cluster successfully. If these permissions are missing, there will be corresponding resources missing on the Instana Kubernetes dashboards. To resolve this issue, please install the latest version of the Instana Agent YAML, Helm chart or Operator. See our Kubernetes or OpenShift documentation for more information on the latest version of each installation method.