Instana Blog

Date: February 6, 2020

ClickHouse Monitoring with Instana

Category: Featured, Product

ClickHouse has some amazing built-in monitoring capabilities accessible through the system tables (system.trace_log, system.metrics, system.query_log, etc.). They are great when diving deep into issues, but when ClickHouse is just one of the many things you have to watch for, it’s best if you can use a fully managed and automated monitoring solution such as Instana. It is no wonder that Instana provides excellent support for ClickHouse because Instana developers and operations are using ClickHouse to power Instana, and Instana to monitor ClickHouse.

Instana ClickHouse Logos
Let’s see how Instana exposes ClickHouse built-in monitoring data visually and through time, puts them into context, provides additional insights across your stack, and does it all with very little effort.

Infrastructure monitoring

To get Instana running, all you need to do is install a single agent per operating system instance. Once the Instana agent is installed on each of the hosts where the ClickHouse servers are running, Instana automatically discovers and provides real time information about the health and performance of the hosts themselves (CPU, memory, and IO) but also of the running ClickHouse servers. It’s therefore a great tool for operations as it helps them keep clusters healthy, upgrade to newer versions smoothly, and get capacity planning right.

Each ClickHouse server gets a dedicated dashboard where you have access to most of the ClickHouse metrics and other information like the number of active parts per table or the running queries. Metrics can also easily be compared across multiple servers.

ClickHouse Monitoring KPI Metric Comparison

Application monitoring based on distributed tracing

Relying on logs or infrastructure metrics only does not give you the full picture because ClickHouse servers do not live on their own. Queries to your ClickHouse cluster may come from multiple services, sometimes managed by different teams. Typical problems such as failed queries, slow queries, N+1 queries, and too frequent rate of inserts, can only be fixed when the service causing the issue is identified. This is why Instana captures the SQL statement, latency, potential error, receiving host, and caller information of every single query to ClickHouse.

ClickHouse Monitoring Distributed Tracing

Instana provides this level of insight because it comes with automatic tracing (no code changes required) across your distributed systems composed of services you built (e.g. Java, Python, NodeJS, etc.), databases, and messaging systems. It can even trace a ClickHouse query all the way back to the user or page who initiated everything from a web or mobile interface thanks to Instana End User Monitoring capabilities.

ClickHouse Monitoring Service Dependency Map

Not only does Instana give you access to all of the individual transactions that ever touched your ClickHouse cluster, it also aggregates them to form higher level concepts that everyone is familiar with: applications, services, and endpoints. Their corresponding dashboards make it easy to spot trends and outliers at a glance. With ClickHouse you get one service representing your ClickHouse cluster, and as many endpoints as there are tables. On each, you’ll find typical performance indicators such as call count, error rate, and latency, but also top queries and error messages.

ClickHouse Monitoring Summary Dashboard

From these dashboards and charts it’s easy to jump to the ClickHouse queries of interest, which can then be further filtered or grouped by all kinds of query properties, including information specific to your domain: tenant, timeframe, known query name, etc.

ClickHouse Monitoring Analytics Errors

Automatic issue detection and alerting

Dashboards are useful when looking for the root cause of a known problem (e.g. reported by one of your users) or when trying to improve reliability or response times in general. However for the rest of the time, it’s best to let Instana do the work for you, and let it detect issues as they arise: disk is soon running out of space, load is too high, sudden drop in the number of requests, error rate or latency too high, etc.

You do not get alerted on every single issue to prevent alert fatigue. Instead, Instana runs a root cause analysis for you by correlating events (e.g. the sudden high CPU usage observed on a ClickHouse server is correlated with the change to the ClickHouse setting max_thread) together to form incidents which are then reported within the product or sent to third-parties like PagerDuty or Opsgenie.

Instana comes with built-in knowledge and rules for all kinds of technologies including ClickHouse, but they can be extended based on your SLAs and things you’ve learned from you own experience operating ClickHouse. For example, you could create an issue every time some set of queries falls below a certain latency threshold or when insert queries are being throttled.

ClickHouse Monitoring Expert Health Rules

Conclusion

Instana provides great monitoring capabilities for ClickHouse and more. Insights are readily available and actionable whether you are a developer building services on top of ClickHouse or an ops in charge of running a ClickHouse cluster. If you are interested in trying it out, it’s easy, you can start a free trial right away, install the Instana agent on your machines, and watch your cluster magically appear on the map!

ClickHouse Monitoring Infrastructure Map

14 days, no credit card, full version

Free Trial

Sign up for our blog updates!
|
Category: Developer, Engineering, Events
Last month I had the chance to visit SRECON19 EMEA in Dublin for the very first time (or any SRECON for that...
|
Category: Events
Instana’s been busy leading up to our sponsorship of SpringOne, announcing support for Pivotal Cloud Foundry with the release of...
|
Category: Events, Featured
This week, we’re heading to SFO - Valley of Silicon and home to Pym Industries - for DevOps World (or...

Start your FREE TRIAL today!

Free Trial

About Instana

As the leading provider of Automatic Application Performance Monitoring (APM) solutions for microservices, Instana has developed the automatic monitoring and AI-based analysis DevOps needs to manage the performance of modern applications. Instana is the only APM solution that automatically discovers, maps and visualizes microservice applications without continuous additional engineering. Customers using Instana achieve operational excellence and deliver better software faster. Visit https://www.instana.com to learn more.