ClickHouse Monitoring for Health and Performance with Instana

February 6, 2020

Built-in ClickHouse Monitoring

ClickHouse has some amazing built-in monitoring capabilities accessible through the system tables (system.trace_log, system.metrics, system.query_log, etc.). They are great when diving deep into issues, but when ClickHouse is just one of the many things you have to watch for, it’s best if you can use a fully managed and automated monitoring solution such as Instana. It is no wonder that Instana provides excellent support for ClickHouse because Instana developers and operations are using ClickHouse to power Instana, and Instana to monitor ClickHouse.

Let’s see how Instana exposes ClickHouse built-in monitoring data visually and through time, puts them into context, provides additional insights across your stack, and does it all with very little effort.

ClickHouse Infrastructure monitoring

To get Instana running, all you need to do is install a single agent per operating system instance. Once the Instana agent is installed on each of the hosts where the ClickHouse servers are running, Instana automatically discovers and provides real time information about the health and performance of the hosts themselves (CPU, memory, and IO) but also of the running ClickHouse servers. It’s therefore a great tool for operations as it helps them keep clusters healthy, upgrade to newer versions smoothly, and get capacity planning right.

Each ClickHouse server gets a dedicated dashboard where you have access to most of the ClickHouse metrics and other information like the number of active parts per table or the running queries. Metrics can also easily be compared across multiple servers.

ClickHouse Monitoring KPI Metric Comparison

Application monitoring based on distributed tracing

Relying on logs or infrastructure metrics only does not give you the full picture because ClickHouse servers do not live on their own. Queries to your ClickHouse cluster may come from multiple services, sometimes managed by different teams. Typical problems such as failed queries, slow queries, N+1 queries, and too frequent rate of inserts, can only be fixed when the service causing the issue is identified. This is why Instana captures the SQL statement, latency, potential error, receiving host, and caller information of every single query to ClickHouse.

ClickHouse Monitoring Distributed Tracing

Instana provides this level of insight because it comes with automatic tracing (no code changes required) across your distributed systems composed of services you built (e.g. Java, Python, NodeJS, etc.), databases, and messaging systems. It can even trace a ClickHouse query all the way back to the user or page who initiated everything from a web or mobile interface thanks to Instana End User Monitoring capabilities.

Service Dependency Map

Not only does Instana give you access to all of the individual transactions that ever touched your ClickHouse cluster, it also aggregates them to form higher level concepts that everyone is familiar with: applications, services, and endpoints. Their corresponding dashboards make it easy to spot trends and outliers at a glance. With ClickHouse you get one service representing your ClickHouse cluster, and as many endpoints as there are tables. On each, you’ll find typical performance indicators such as call count, error rate, and latency, but also top queries and error messages.

ClickHouse Monitoring Summary Dashboard

From these dashboards and charts it’s easy to jump to the ClickHouse queries of interest, which can then be further filtered or grouped by all kinds of query properties, including information specific to your domain: tenant, timeframe, known query name, etc.

ClickHouse Monitoring Analytics Errors

Automatic issue detection and alerting

Dashboards are useful when looking for the root cause of a known problem (e.g. reported by one of your users) or when trying to improve reliability or response times in general. However for the rest of the time, it’s best to let Instana do the work for you, and let it detect issues as they arise: disk is soon running out of space, load is too high, sudden drop in the number of requests, error rate or latency too high, etc.

You do not get alerted on every single issue to prevent alert fatigue. Instead, Instana runs a root cause analysis for you by correlating events (e.g. the sudden high CPU usage observed on a ClickHouse server is correlated with the change to the ClickHouse setting max_thread) together to form incidents which are then reported within the product or sent to third-parties like PagerDuty or Opsgenie.

Instana comes with built-in knowledge and rules for all kinds of technologies including ClickHouse, but they can be extended based on your SLAs and things you’ve learned from you own experience operating ClickHouse. For example, you could create an issue every time some set of queries falls below a certain latency threshold or when insert queries are being throttled.

ClickHouse Monitoring Expert Health Rules

How to get ClickHouse monitoring

Instana provides great monitoring capabilities for ClickHouse and more. Insights are readily available and actionable whether you are a developer building services on top of ClickHouse or an ops in charge of running a ClickHouse cluster. If you are interested in trying it out, it’s easy, you can start a free trial right away, install the Instana agent on your machines, and watch your cluster magically appear on the map!

ClickHouse Monitoring Infrastructure Map

Play with Instana’s APM Observability Sandbox

Announcement, Developer, Featured, Product
Instana has been leading the Application Performance Monitoring (APM) industry with our automated distributed tracing technology, AutoTrace™. With AutoTrace, Instana has eliminated the need to manually instrument distributed tracing in your environment....
Announcement, Developer, Featured, Product, Thought Leadership
AWS Lambda, the serverless functions (or FaaS) offering from Amazon continues to grow in usage, both overall and in production applications. One of the biggest challenges is how to trace and monitor...
Announcement, Product
With this, monitoring and tracing of PHP services is as important as it has been in the past two decades. Instana is always eager to improve the support of PHP and is...

Start your FREE TRIAL today!

As the leading provider of Automatic Application Performance Monitoring (APM) solutions for microservices, Instana has developed the automatic monitoring and AI-based analysis DevOps needs to manage the performance of modern applications. Instana is the only APM solution that automatically discovers, maps and visualizes microservice applications without continuous additional engineering. Customers using Instana achieve operational excellence and deliver better software faster. Visit to learn more.