Monitoring Ceph

Supported versions

Currently supported versions are Luminous (12) and Kraken (11).

Configuration

To enable in depth metric monitoring, the Agent requires the Ceph executable path. You can configure it here <agent_install_dir>/etc/instana/configuration.yaml:

com.instana.plugin.ceph:
  ceph-executable-path: '' # default path is /usr/bin/ceph

Metrics collection

Configuration data

  • Fsid
  • Cluster name
  • Version
  • Overall cluster status
  • Pools

Performance metrics

Cluster

Metric Description
Commit latency Time taken to commit an operation to the journal (shown as milliseconds)
Apply latency Time taken to flush an update to disks (shown as milliseconds)
All OSDs Number of known storage daemons
Up OSDs Amount of messages that have been acknowledged on all queues
In OSDs Number of online storage daemons
Near full OSDs Number of nearly full osds
Full OSDs Number of full osds
All monitors Number of monitor daemons
Healthy monitors Number of healthy monitor daemons
Read bps Bytes/second being read
Write bps Bytes/second being written
Read ops Read operations per second for given pool
Write ops Write operations per second for given pool
Capacity usage Overall cluster capacity usage
All pools Number of pools
All objects Number of objects
All pgs Number of all placement groups
Active+Clean pgs Number of active+clean placement groups

Pool

Metric Description
Capacity usage Overall cluster capacity usage for a given pool
All objects Number of objects for a given pool
Read bytes Per-pool read bytes
Write bytes Per-pool write bytes
Read bps Bytes/second being read for given pool
Write bps Bytes/second being written for given pool
Read ops Read operations per second for given pool
Write ops Write operations per second for given pool

Health Signatures

For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.

Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.

For information about built-events for the Ceph sensor, see the Built-in events reference.