Monitoring Elasticsearch
TABLE OF CONTENTS
Configuration
Instana automatically monitors up to 1000 indices and collects 5 most important metrics per index.
To enable in-depth index monitoring (~20 metrics/index) for up to 200 indices, you need to specify indicesRegex
in the agent configuration file <agent_install_dir>/etc/instana/configuration.yaml
:
com.instana.plugin.elasticsearch:
enabled: true # true
indicesRegex: '.*'
Metrics collection
Node-Level
Configuration data
- Version
- Cluster
- Health Status
- Node Name
- Node Type
- Node is Master
- Node is Master Eligible
- Transport
- Log Directory
- Shards
- Indices
Performance metrics
Data point | Description | Granularity |
---|---|---|
Query Latency | Query latency is collected from NodeIndicesStats#SearchStats . |
1 second |
Number of Queries | Query count per second is collected from NodeIndicesStats#SearchStats . |
1 second |
Overall Documents | Total Documents is collected from DocsStats#count . |
1 second |
Added Documents | The total number of indexing operations is collected from IndexingStats#indexCount . |
1 second |
Removed Documents | The number of delete operation executed is collected from IndexingStats#deleteCount . |
1 second |
Active Shards | The number of active shards is collected from IndexRoutingTable#ShardRouting . |
1 second |
Active Primary Shards | The number of active primary shards is collected from IndexRoutingTable#ShardRouting . |
1 second |
Refresh Count | The number of refresh executed per second is collected from NodeIndicesStats#RefreshStats . |
1 second |
Refresh Time | The total time merges have been executed is collected from NodeIndicesStats#RefreshStats . |
1 second |
Flush Count | The total number of flush executed per second is collected from NodeIndicesStats#FlushStats . |
1 second |
Flush Time | The total time merges have been executed is collected from NodeIndicesStats#FlushStats . |
1 second |
Indices metrics | Documents count, Deleted count and Size per index is collected from IndexStats#DocsStats . |
1 second |
Lucene Segments | The number of segments is collected from NodeIndicesStats#SegmentsStats#count . |
1 second |
Active Threads | Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#active . |
1 second |
Queued Threads | Search, Index, Bulk, Merge, Flush, Get, Management, Refresh are collected from ThreadPoolStats.Stats#queue . |
1 second |
Rejected Threads | Search, Index, Bulk, Get are collected from ThreadPoolStats.Stats#rejected . |
1 second |
Sent Data | Size of TX packets sent by the node during internal cluster communication is collected from TransportStats#tx_size |
1 second |
Received Data | Size of RX packets received by the node during internal cluster communication is collected from TransportStats#rx_size |
1 second |
Index metrics
Data point | Description | Granularity |
---|---|---|
Total Queries | The total number of query operations is collected from SearchStats.Stats#queryTotal |
1 second |
Queries Current | The number of query operations currently running is collected from SearchStats.Stats#queryCurrent |
1 second |
Fetches Total | The total number of fetch operations is collected from SearchStats.Stats#fetchCount |
1 second |
Fetches Current | The number of fetch operations currently running is collected from SearchStats.Stats#fetchCurrent |
1 second |
Query Time | Time in milliseconds spent performing query operations is collected from SearchStats.Stats#queryTimeInMillis |
1 second |
Fetch Time | Time in milliseconds spent performing fetch operations is collected from SearchStats.Stats#fetchTimeInMillis |
1 second |
Query Cache Memory | Total amount of memory used for the query cache is collected from QueryCacheStats#ramBytesUsed |
1 second |
Query Cache Evictions | The number of query cache evictions is collected from QueryCacheStats#evictions |
1 second |
Request Cache Memory | The number of request cache evictions is collected from RequestCacheStats#ramBytesUsed |
1 second |
Request Cache Evictions | The number of request cache evictions is collected from RequestCacheStats#evictions |
1 second |
Get Requests | The total number of Get request is collected from GetStats#count |
1 second |
Get Requests Time | Time in milliseconds spent on Get requests is collected from GetStats#timeInMillis |
1 second |
Get Requests Failed | The number of failed Get requests is collected from GetStats#missingCount |
1 second |
Get Requests Failed Time | Time in milliseconds spent on failed Get requests is collected from GetStats#missingTimeInMillis |
1 second |
Indexing Operations Failed | The number of failing indexing operations is collected from IndexingStats#indexFailedCount |
1 second |
Active Merges Count | The current number of merges executing is collected from MergeStats#current |
1 second |
Total Merges Size | The total size of merges executed is collected from MergeStats#totalSizeInBytes |
1 second |
Total Merges Time | The total time merges have been executed is collected from MergeStats#totalTimeInMillis |
1 second |
Index metrics mentioned above are going to be enabled for indices configured via regular expression indicesRegex
in the agent configuration.
Health Signatures
For each sensor, there is a curated knowledgebase of health signatures that are evaluated continuously against the incoming metrics and are used to raise issues or incidents depending on user impact.
Built-in events trigger issues or incidents based on failing health signatures on entities, and custom events trigger issues or incidents based on the thresholds of an individual metric of any given entity.
For information about built-events for the Elasticsearch Node, see the Built-in events reference.
Cluster-Level
Configuration data
- Name
- Health Status
- Nodes, Masters
Performance metrics
Data point | Description | Granularity |
---|---|---|
Query Latency | Query latency is calculated as max query latency of all nodes. | 1 second |
Number of Queries | Query count is calculated as query count sum for all nodes. | 1 second |
Overall Documents | Total Documents is calculated as sum of overall documents for all nodes. | 1 second |
Added Documents | Added Documents is calculated as sum of added documents for all nodes. | 1 second |
Removed Documents | Removed Documents is calculated as sum of removed documents for all nodes. | 1 second |
Indices | Number of indices | 1 second |
Shards | Active, Active Primary, Initializing, Relocating, Unassigned is collected from ClusterHealth . |
1 second |
Cluster State size | Size of the ClusterState . |
1 second |
Health Signatures
For information about built-events for the Elasticsearch Cluster, see the Built-in events reference.