Lessons Learned Shipping Instana Self-Hosted On Kubernetes

Lessons Learned Shipping Instana Self-Hosted On Kubernetes

At Instana, we recently improved the installation process for our self-hosted customers. Instana’s self-hosted platform now utilizes a fully Docker based installation process. In a previous blog post, Lessons Learned From Dockerizing Our Application, we discussed how containerizing our product led to an enterprise ready self-hosted platform that is scalable and continuously upgradable, with as little effort as possible.

This initial single host, dockerized approach has proven to be quite useful for the vast majority of our self-hosted customer base. We are now building upon that success to further improve scalability for our biggest customers running extremely large infrastructures. To do so, we are piggy-backing off the learnings our SaaS team made while moving our SaaS infrastructure to Kubernetes (K8s). With those learnings in hand, it is time to deliver an even more scalable, Kubernetes-based option for our customers, fulfilling the scalability demands a single box could not fully handle.

Instana strives to never add complexity unless the benefits greatly outweigh the added complexity. Based on our experience with K8s we’ve learned that it is not a silver bullet to make your life easier. That being said, K8s does solve many problems by providing an abstraction layer, and while it does introduce some complexity, the benefits are more than worth it. By extending our product to be operable on K8s, we are providing the following benefits to our self-hosted customers:

  • K8s makes scalability easier by abstracting away the underlying hardware and providing a standard API
  • There is already a high adoption rate of K8s within the market and, in particular, especially within Instana’s customer base
  • Customers consolidating all their workloads on K8s already understand how K8s works, minimizing the impact of introducing new complexity
  • Allows our customers to manage Instana the same way they operate their own applications
  • This enables Instana to work on more advanced operational use cases like: guided or automated scale out

This rest of this article looks at the reasons for going beyond dockerization and into orchestrating our application with Kubernetes, why we decided to use the K8s operator model, and some key things we learned about moving to K8s along the way.

Why make our self-hosted application run on K8s now?

After the successful adoption of our dockerized single host installation, we projected the challenges we might face in the future and identified two clear things: K8s is getting widespread adoption and our customer base is growing with increasing amounts of data that needs to be processed. We are well aware that making this step will bring in some complexity. Since our product is hosted by the customer, it is by no means natural that we can rely on cloud APIs for storage and inbound traffic and we need to be prepared for a variety of implementations. But Kubernetes brings a standardized API that we can use to abstract away a lot of work while adopting proven architectural patterns like the operator to increase automation.

Why we chose the K8s operator pattern deployment model

There are many different ways to deploy applications to Kubernetes. For smaller applications, DevOps teams typically go with something like handcrafted and templated YAML-files that would then be handled by the CI/CD-infrastructure.

Helm was the next evolutionary step. It was, and for many still is, the de facto way of providing an easy installation path for various applications ranging from OSS-databases to various commercial products. But even Helm doesn’t solve the problems a complex distributed application like Instana is facing when deploying to Kubernetes.
Let’s look at a few numbers. Instana consists of:

  • 6 different databases
  • Around 30 application components, each one scaled individually
  • 3 ingress endpoints
  • Web UI and API
  • A 4 week release cycle

Component updates and database migrations have to be coordinated to minimize downtime.
While configurations have to be updated on the fly, taking into account the different scaling and deployment requirements for each component.

We all know that a well behaved distributed application should just recover from whatever is thrown at it. While Instana can handle all kinds of failures, ranging from dying nodes to network outages and overload scenarios, we also know that the best failure is the one that is mitigated so fast no one ever notices.

Taking this into account with our mission to have a mostly hands-off, self maintaining system, we see that a more active component is required.

Enter: The Operator Pattern. The Operator Pattern provides several benefits to the way Instana provides its self-hosted application. These include:

  • Different ways of deploying and operating workloads on kubernetes
  • Mapping of an organizational structure in a single environment, company divisions to tenant units
  • Minimal downtime during maintenance
  • Platform independence through Kubernetes abstraction
  • Adoption of as many experiences as possible from our Saas infrastructure

Using the operator framework will also allow auto scaling functionality that exactly matches Instana components needs.

Four lessons about moving to K8s

Accessing K8s external databases from inside the cluster across namespaces

As mentioned above: There are a lot of databases in Instana. For these databases there are many different ways of running them:

  • dedicated cluster
  • single box
  • running via an operator in the same cluster

We had to find a way to account for all of these options since customers will run them in whatever way they please. For Instana, we decided to embrace Kubernetes fully and treat databases as just another set of services. So, instead of expecting databases to be fully configured for the services, we wanted to rely on something we could look up via DNS. Now, there are a few things to know before diving into this.

Most databases have support for resolving cluster members via a DNS facility called SRV-records. SRV-records allows you to assign multiple IPs to a single name. A DNS-client can now resolve those and use all of them. In Java this is done with a simple call of InetAddress.getAllByName(<name>) and this is what most java based databases do out of the box (and yes, it’s not the nicest way to do this since it ignores TTLs, but that’s how they do it). So a service in K8s will have its dedicated DNS-name and all associated IP-addresses as its SRV records. Pretty convenient.

Defining a service accessing other pods running in K8s is straight forward and there are plenty of examples out there. The case we wanted to support, databases external to the K8s-cluster, wasn’t really that well explained and it took a while to figure it out.

The trick is to have a service with clusterIP: None and the endpoints being defined explicitly:

apiVersion: v1
kind: Service
  name: cassandra-spans
clusterIP: None
  - name: "tcp"
  protocol: "TCP"
  port: 9042
  targetPort: 9042
apiVersion: v1
kind: Endpoints
  name: cassandra-spans
  - addresses:
    - ip:
    - ip:
    - ip:
    - port: 9042
    name: "tcp"

Calling InetAddress.getAllByName(<name>) will result in the unorder list of,,

This already implies a problem with this approach: K8s doesn’t follow the SRV-spec completely, which requires entries to be ordered based on the priority of entries. For most databases this doesn’t matter, but in other scenarios you might be interested in that information or you might have to rely on a guaranteed order of these entries. There is work being done on this issue so we expect to see progress there in the near future.

Another downside of this approach is that it will only accept IP-addresses, host names are simply not allowed.

Persistent volume must be readable and writable from many pods

In cloud environments we use cheap object stores to persist details of traces that are ingested into Instana. This approach has proven to be the most efficient but is not always applicable in private data centers. While customers do tend to have some S3 API compliant storage on-premises, the limits are usually so restrictive that it’s not usable for our high data volume use case. For the single-host dockerized installation we use local disks to store the data and have achieved very good results with that approach.

Now in the Kubernetes environment we are facing some challenges: the first is that we cannot simply write to the local disk because we do not want to bind a component to a single node and have a second component to be forced to be collocated just to read the data. The other dimension to keep in mind is that we are preparing to scale out the components and we need to have one place to store the data. Thus we need to offer multiple configurable options including one for NFS storage that we are abstracting as persistent volume and attach to it through a persistent volume claim. For reference, you can check our template, but there are also some challenges that occurred in real customer environments. The Rancher environment where we supported the customer to roll out the new deployment model, the NFS PVC was automatically created through Rancher, with unique paths each time a PVC is created. That of course does not fit our use case as we need to rely on the same path to find the persisted data.


The only scenario currently where the ingress configuration for a K8s cluster is simple is in the cloud. The cloud provider takes over the heavy lifting automatically creating a load balancer dynamically assigning IP addresses and wiring all the bits together. With customers running K8s in a private datacenter we face a lot of different implementations and scenarios. In one scenario we exposed nodeports and configured a DNS load balancer to distribute connection attempts from the agent to the backend. This led to multiple challenges: in scaled out environments we have multiple instances of our component called acceptor that holds the connection to the agents and forwards the data to the subsequent processing layers. With the nodeport configuration, we are no longer able to let the components float freely in the K8s cluster, we are forced to pin them to specific nodes so that traffic can alway reach its destination from the outside. In another scenario the customer is using HAProxy HTTP load balancers with TLS termination at the edge, whereas we rely on encrypted traffic and expect the termination to happen by our components. All of these scenarios require us to deeply involve ourselves into individual customer setups showing where K8s fails to provide a sufficient abstraction layer to deliver distributed software products without headache.

Configuration management on K8s is hard

Putting all options and dimensions together we faced a pretty complex system. At the same time we wanted it as easy as possible to install and operate. For the operations, as previously mentioned, we decided to rely on the operator model and to codify the expertise that our SREs acquired running Instana at high scale. This is especially true for the zero day experience, where it is crucial to reduce the time to value.

For the beta launch we created a Github repo https://github.com/instana/self-hosted-k8s/
with predefined kustomize templates, a build in templating feature by K8s to generate configurations and the potential misconfiguration, because configuring the yaml files is bringing a lot of pitfalls. What we learned pretty fast is that the barrier is still too high, the user needs to be familiar with the Instana architecture to get started. Looking at the adoption of our settings from the single host installation where we reduced to the max, we decided to take a similar approach with kubernetes deployment for the future:

  1. HELM chart for deployment of the operator with any other prerequisites
    1. One Liner to quickly get started
    2. Deliver zero-day utility functions
  2. Use a simple settings.hcl to generate
    1. Known to our customers
    2. Better experience than YAML
donwload_key = "dl_key_1234"
sales_key = "sales_key_1234"
base_domain = "instana.rocks"
agent_ingress = "fullapm.instana.rocks"
core_name = "wow"
databases "cassandra_service" {
  database = "cassandra"
  namespace = "cassandra_namespace"
  scheams = ["schema1", "schema2"]
email {
  smtp {
  from = "[email protected]"
  host = ""
  port = 0
  user = ""
  password = ""
  use_ssl = false
  start_tls = false
profile = "medium"
spans {
  persistent_volume {
    volume_name = "volumina"
    storage_class = "classica"
units "unit1" {
    namespace = "unit_namespace"
    tenant_name = "tenant1"
    initial_agent_key = "supersecret_key"

Should You move your workloads to K8s?

There should be no illusions, K8s will add complexity to your tech-stack. In cloud environments the barrier is lower as you can rely on cost-efficient managed services like Google GKE or Amazon EKS. But it also provides a standardized abstraction and, as the wide adoption in the market clearly shows, is the way forward

Planning to Virtually Visit Kubecon?

To learn more about the innovative ways Instana works with Docker, drop in our virtual session Monitoring in a Microservices World at Dockercon on May 28th at 1pm EST. Fabian Stäber, Senior Engineering Manager, will be discussing the paradigm shift in software engineering away from static monolithic applications towards dynamic distributed horizontally scalable architectures. And how Docker is one of the key technologies enabling this development.

Play with Instana’s APM Observability Sandbox

Featured, Thought Leadership
The Muddy Messaging of Observability and Application Performance Management Here's a question I get asked quite a bit: “How is Enterprise Observability different from APM and/or just plain Observability?” It’s a reasonable...
Featured, Product, Thought Leadership
Instana prides itself in being the first Observability tool to launch support of Google Cloud Run via a Cloud Native Buildpack. The Instana Cloud Native Buildpack for Cloud Run makes adding Instana...
Developer, Thought Leadership
Kubernetes (also known as k8s) is an orchestration platform and abstract layer for containerized applications and services. As such, k8s manages and limits container available resources on the physical machine, as well...

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit instana.com.