Monitoring CRI-O

Introduction

Instana automatically discovers and monitors CRI-O containers to provide you with realtime insights into metadata (labels), metrics, and any supported technologies running within each discovered container.

Along with monitoring the health of each container and receiving alerts of any issues, you can also enable service discovery to leverage all of your container information.

Metrics collection

To view an overview of CPU and memory usage of your containers, activate the metric overview option on the infrastructure map. You can also use the Dynamic Focus feature to identify and isolate parts of your infrastructure within the context of your containers.

Instana monitors the infra container that runs in each pod. The infra container (or 'pause' container) is a container which holds the network namespace for the pod. Kubernetes creates pause containers to acquire the respective pod’s IP address and set up the network namespace for all other containers that join that pod.

By default, CRI-O metrics are collected every 10 seconds. This interval can be configured within the agent configuration file <agent_install_dir>/etc/instana/configuration.yml:

com.instana.plugin.crio:
  stats:
    interval: 10

On the CRI-O container dashboard, the configuration and performance metrics for the container are displayed.

Configuration data

Configuration Description
Id The container ID.
Name The container name.
Image The CRI-O image name.
IP The container IP.
Created The container created timestamp.

Performance metrics

The performance metrics are collected using the runc command.

CPU Total %

The total % of CPU usage. The current measured KPI value is displayed.

Data point: value is collected from the total key returned in the cpu.usage object.

Memory usage

The total memory usage. The current measured KPI value is displayed.

Data point: value is collected from the usage key returned in the memory.raw object.

CPU

The total, kernel, and user metrics are displayed on a graph over a selected time period.

Data points: values are collected from the total, kernel, and user keys returned in the cpu.usage object.

Throttling count and time values are displayed on a graph over a selected time period.

Data points: values are collected from the throttling.throttledPeriods and throttling.throttledTime keys returned in the cpu_stats object.

Memory

The usage, RSS, and cache metrics are displayed on a graph over a selected time period.

Data points: the values are collected from the usage, max, and limit keys returned in the memory.usage object.

Active anonymous, active cache, inactive anonymous, and inactive cache metrics are displayed on a graph over a selected time period.

Data points: the values are collected from the active_anon, active_file, inactive_anon, and inactive_file keys returned in the memory.raw object.

Block IO

The read and write values are displayed on a graph over a selected time period.

Data point: values are collected from the ioServiceBytesRecursive.op.Read and ioServiceBytesRecursive.op.Write fields.

Health signatures

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about the built-in event for the CRI-O sensor, see the Built-in events reference.