How Instana Uses AWS EKS

December 13, 2021

How Instana Uses AWS EKS

Instana is the first and only Enterprise Observability solution designed specifically for the challenges of managing microservices and distributed, cloud-native applications. Our SaaS platform has to process and store large amounts of telemetry data from our customers. Each day we process about a petabyte of ingress data.

Instana processes a ton of data.

In this article we will show how we use AWS EKS to run all ingress and processing components for our SaaS platform.

High level architecture

On a high level view we have two types of regions, GlobalRegion and MultiRegion. The GlobalRegion runs all global components used for licensing, authentication and accounting. The MultiRegions run all processing components that drive our SaaS product. MultiRegions are spread across multiple continents and cloud providers to offer the best service and latency for our customers. The networks for all MultiRegions are completely isolated from one another and have their own VPC configuration. Each region has its own datastore clusters and handles the processing of a subset of our customers.

Once a region and its clusters reach a certain size, we spin up a new region and deploy all new customers to that region. This allows us to minimize the blast radius in case of failures and thus reduces risk for us as well as for our customers. Being able to easily create a complete new region while our customer base grows was one reason for using managed EKS clusters in our AWS regions.


PHP at Instana - NoteClick on any screenshot for a full-sized view.

Instana's high-level architecture, including two types of regions, GlobalRegion and MultiRegion.
Instana’s high-level architecture, including two types of regions, GlobalRegion and MultiRegion.


Let us take a look at our GlobalRegion. It is our smallest Kubernetes cluster and runs cross-functional components to manage licenses, authentication, authorization and accounting across our customer base.


All our high level ingress and processing happens in our MultiRegions. Each one of these regions runs about 2-3,000 processes. Most of our components are written in Java using the Dropwizard framework or JavaScript / NodeJS. We spread our processing components in three nodegroups:

  • Acceptor NodeGroup for all ingress traffic
  • Core NodeGroup for shared processing components
  • Tenant Unit (TU) NodeGroup for tenant unit specific processing components

Using Kubernetes selectors we can easily separate our components and group them together by resource requirements.

Here is what a typical MultiRegion looks like. We label each region with a color since this gives us a unique identifier that everyone inside Instana understands. Using the AWS regions would limit us to only create one MultiRegion, i.e. for us-west-2, which is something we did not want to be limited by.

Each MultiRegion has its own VPC with public and private subnets spread across three availability zones. The EKS nodegroups are only configured to use the private subnets so none of our components are directly accessible from the internet.

We label each MultiRegion with a color.
We label each MultiRegion with a color.

Acceptor NodeGroup

Instana supports infrastructure monitoring, end-user-monitoring, distributed tracing and serverless monitoring. Therefore we operate different ingress endpoints that are accessible via AWS TCP or HTTPS load balancers. All of the ingress components run in a dedicated nodegroup, the Acceptor NodeGroup, which has a custom resource profile that best matches its workload.

Core NodeGroup

Besides the ingress components we have a pool of shared components that are used for different purposes. Some of these components process, transform and redistribute the ingress data. Other components store that data, in various formats, across several datastores that we use. And others serve our user interface.

TenantUnit (TU) NodeGroup

Last but not least, we have a dedicated nodegroup for tenant unit components. In this nodegroup, we run all the heavy-duty processing that comes with 1s metric resolution and tracing every call.

As an example, this is a snippet how the tenant unit nodegroup configuration looks like. By default we enable autoScaler, externalDNS and certManager policies for this nodegroup, which will be covered in the next chapter. If you are interested in the full example, you will find a full EKS cluster config in the appendix.

- name: private-tenantunit-0
instanceType: r5.4xlarge
privateNetworking: true
labels: {instanaGroup: tenantUnit, vendor: instana, zone: private}
minSize: 1
maxSize: 100
autoScaler: true
externalDNS: true
certManager: true
instanaGroup: tenantUnit
vendor: instana
zone: private

Creating an EKS cluster via eksctl

Now that we have an understanding of the overall architecture, let’s take a look at how we approach our cluster setup. To follow the infrastructure-as-code paradigm, we use eksctl to create and maintain all our EKS clusters. The configuration is stored in a yaml file and can be used to manage the complete lifecycle of a cluster. We started with EKS K8s version 1.15 and used eksctl from the start. We also did our upgrades from 1.15 to 1.16 and later on from 1.16 to 1.17 via eksctl and so on. At the moment we are using version 1.19 and will be upgrading to 1.20 in the upcoming weeks.

Install eksctl

To install eksctl run the following commands and make sure you have at least version 0.33.0:

> brew tap weaveworks/tap
> brew install weaveworks/tap/eksctl
> source <(eksctl completion bash)
> eksctl version

Create KMS key

Before creating the EKS Kubernetes cluster, you must create a KMS (Key Management Service) key. The KMS key is used to encrypt / decrypt the K8s secrets in the managed EKS cluster. This will encrypt secrets stored in etcd.

Create the key in AWS Console and follow the wizard to create a symmetric key and define key usage permissions:

Create the KMS key in AWS Console and follow the wizard to define permissions.
Create the KMS key in AWS Console and follow the wizard to define permissions.


Create EKS cluster from config.yaml

In the appendix of this post you can find a full example cluster config.yaml file that you can use to create an EKS cluster that matches the architecture diagram above.

> eksctl create cluster --config-file eks-pink-config.yaml

Once you execute the above command, eksctl will trigger CloudFormation to create all required resources.


Set up cluster-wide services

For all of our Kubernetes clusters we use a set of cluster-wide services that make our life in the SRE team easier when maintaining the cluster. Here are a few examples of services that are important to us:

  • external-dns to configure DNS entries in Route53
  • cluster-autoscaler to automatically add or shrink the nodegroups when new customers are deployed / undeployed or core components are scaled out
  • kube-dns-autoscaler to scale coredns pods to match the overall cluster size
  • instana-agent for full infrastructure and Kubernetes monitoring, and distributed tracing


ExternalDNS regularly synchronizes exposed Kubernetes Services and Ingresses with DNS providers. It supports a wide range of standard DNS providers like AWS Route53. Like KubeDNS, it retrieves a list of resources (services, ingresses, etc.) from the Kubernetes API to determine a desired list of DNS records. Unlike KubeDNS however, it is not a DNS server itself, but merely configures other DNS providers accordingly. In our case this is AWS Route 53.

external-dns allows us to automatically generate Route53 entries for our Kubernetes services.

An example config to install external-dns is in the appendix:

> kubectl apply -f external-dns-pink.yaml

You can check the logs for external-dns using:

> kubectl logs -f -l app=external-dns

Check the logs for external-dns.
Check the logs for external-dns.


The Kubernetes Cluster Autoscaler automatically adjusts the number of EC2 nodes in your cluster. It will launch new nodes into a node group when there are not enough resources left to launch a pod and makes sure to move pods and remove nodes when there are too many under-utilized resources.

Install cluster-autoscaler:

> kubectl apply -f cluster-autoscaler-autodiscover.yaml

You can check the logs for the autoscaler using:

> kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler

Check the logs for the autoscaler.
Check the logs for the autoscaler.


The Kubernetes cluster-proportional-autoscaler is used to automatically adjust the number of coredns pods in our cluster when the number of running pods increases.

Install kube-dns-autoscaler:

> kubectl apply -f kube-dns-autoscaler.yaml

You can check logs:

> kubectl -n kube-system logs -f deployment.apps/kube-dns-autoscaler

Instana Agent

Finally, as a monitoring company ourselves, the natural choice for us is to eat your own dog food and use Instana to monitor Instana. There are several methods to install the instana-agent onto a Kubernetes cluster. We recommend installing the agent using the Helm Chart or YAML file (DaemonSet) or using the K8s Operator.

Currently, we use a daemonset for all EKS cluster as described here:

A few minutes after deploying the Instana agents across the K8s cluster, we get full insight into all the components running in the cluster, as well as the K8s cluster itself. This includes infrastructure and component metrics, distributed traces, auto-profiling and alerting on built-in events.

In a few minutes we get full insight into all the components running in the cluster, as well as the K8s cluster.
In a few minutes we get full insight into all the components running in the cluster, as well as the K8s cluster.

Here is a screenshot of the EKS cluster dashboard for our test environments that is used by developers. This dashboard gathers all high level metrics for the cluster and is the starting point to dig into further metrics for nodes, namespace, deployments, daemonsets, statefulsets, services, pods and infrastructure.

Here is the EKS cluster dashboard for our test environments.
Here is the EKS cluster dashboard for our test environments.

This is a map of all running containers grouped by namespace. There are several useful views available that help investigate pod distribution and resource usages across the whole EKS Kubernetes cluster.

This is a map of all running containers grouped by namespace.
This is a map of all running containers grouped by namespace.


So far we are happy with our decision to use managed EKS clusters for our AWS regions. Using managed EKS, we do not have to spend time operating and maintaining the core cluster infrastructure ourselves, which allows for more time for focusing on improving our platform and product. If there is one improvement we could make it would be to improve the provisioning times. This can sometimes take a bit until a cluster is fully up and running.

We have not had any production problems, and we hope it stays that way. Take a guided tour through our Play With environment to learn more about how Instana works.


This appendix includes a couple stripped down config files we use to setup our EKS Kubernetes clusters. Make sure to use the latest versions since the tools we used in this article move fast and update their APIs regularly.

eksctl cluster eks-pink-config.yaml for pink MultiRegion



Play with Instana’s APM Observability Sandbox

Engineering, Product
As of the latest release, Instana supports the monitoring of Ruby applications running on Fargate, a serverless container orchestrator managed by Amazon Web Services. This enables Ruby teams to take advantage of...
Announcement, Conceptual, Developer, Engineering, Product
According to AWS: “[Graviton2 is] custom built by Amazon Web Services using 64-bit Arm Neoverse cores to deliver the best price performance for your cloud workloads running in Amazon EC2” At Instana...
Announcement, Developer, Product, Thought Leadership
AWS Lambda supports development and deployment of functions in a variety of programming languages including Node.js, Go, Java, and Python. Instana has previously announced automated tracing of AWS Lambda Functions for Node.js and...

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit