Instana Blog

Date: September 26, 2019

Canary Testing with Istio

What is Canary Testing?

Miners took a caged canary down the mine with them to detect carbon monoxide and other noxiuous gasses. When the canary stopped singing it was time for the miners to leave before they also stopped breathing. This kept the miners safe but it did not always end well for the canary. Today electronics have taken the place of the canaries.

The same idea of sacrificial testing can also be used for evaluating new features or fixes in microservice applications. Anyone who has been involved with software development knows that there’s nothing like giving code to users to make it break. With canary testing, a very small percentage of requests are served by the new code, if the code does not work as expected, no canaries die and only a tiny number of users have been impacted.

What is Istio?

Istio is a service mesh for Kubernetes. So what’s a service mesh?

A service mesh provides discovery, load balancing, failure recovery, metrics and monitoring, A/B testing, canary testing, rate limiting, access control, and end to end authentication. Phew, that’s a lot! This article will not cover all use cases of Istio but instead focus on canary testing; these techniques can also be used for A/B testing.

Why use a service mesh? You can do canary testing using just the capabilities of Kubernetes. A Service is mapped to Pod(s) via the labels.

apiVersion: v1
kind: Service
metadata:
  name: cart
spec:
  …
  selector:
    service: cart

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cart
  labels:
    service: cart
spec:
  replicas: 9
  …
  template:
    …
    spec:
      containers:
        image: robotshop/rs-cart:1.0.0

The selector maps the Service to any Pods with the matching label. Therefore if a new Deployment is created.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cart-test
  labels:
    service: cart
spec:
  replicas: 1
  …
  template:
    …
    spec:
      containers:
        image: robotshop/rs-cart:2.0.0

Notice that this new Deployment is using a different version of the container image and has a different number of replicas. The kube-proxy process running on every node provides load balancing across all matching Pods. This example results in 10% of requests going to version 2.0.0 and 90% to the original version 1.0.0. So far so good. However, if the new release has problems, it could result in a loss of 10% in revenue, which is too high. To reduce the risk exposure down to a more acceptable 0.1% the replica ratio would have to be 999:1. This requires a large number of extra Pods running, all consuming resources. Not an efficient use of resources, this is where Istio, a service mesh, comes to the rescue.

Environment

The environment used for the screenshots consists of:

Create a Kubernetes cluster with 3 nodes of type n1-standard-4. Do not select the “Enable Istio (beta)” checkbox, Install it with Helm following the Istio documentation.

Istio uses a sidecar container running Envoy on each Pod to manage the traffic. This sidecar can be automatically injected by Istio when the Pod is created. Create the Namespace for Stan’s Robot Shop and enabled automatic sidecar injection.

$ kubectl create ns robot-shop
$ kubectl label ns robot-shop istio-injection=enabled

Finally deploy Stan’s Robot Shop in to the Namespace then install the Instana agent using its Helm chart.

By default the “web” Service of Stan’s Robot Shop is configured with a type of LoadBalancer. This should be changed to ClusterIP when running with Istio because all traffic should go via Istio’s ingress control. Create the Istio Gateway and VirtualService for Stan’s Robot Shop.

$ kubectl apply -f K8s/Istio/gateway.yaml

Now the shop front is available via the Istio Ingress Gateway.

Having a Canary

With the load generation script from Stan’s Robot Shop providing some traffic, the experimentation can start. Initially a new Deployment for the new version of the payment service is created, without any extra Istio configuration.

 

errors after change

container comparison

The new Deployment (flagged on the timeline) is matched by the Service labels and Kubernetes automatically starts load balancing across both the new and old Deployments. Instana traces every request, provides 1 second metric granularity and with only a few seconds latency, giving immediate feedback that the new Deployment is causing problems. The replica ratio for the Deployments is 1:1 resulting in up to 33% of the requests getting errors. If this was a real application that could be a 33% loss of revenue. Fortunately due to Instana’s immediate feedback, the decision to rollback the new deployment can be made quickly before too much damage is done to the bank balance.

Istio provides much finer control over the level of exposure given to a new canary test, greatly reducing the risk while still providing a means to test a new release in production. First the Deployments need some extra labels.

…
kind: Deployment
metadata:
  name: payment-fix
  labels:
    service: payment
    stage: test

…
kind: Deployment
metadata:
  name: payment
  labels:
    service: payment
    stage: prod

The extra label “stage” is then used to match in an Istio Destination Rule which also references the payment Service, this creates the subsets.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: canary-test
spec:
  host: payment.robot-shop.svc.cluster.local
  subsets:
  - name: production
    labels:
      stage: prod
  - name: canary
    labels:
      stage: test

Finally the Istio Virtual Service uses the Destination Rule subsets and provides fine control over the distribution of requests.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: robotshop-canary
spec:
  hosts:
  - payment.robot-shop.svc.cluster.local
  http:
  - route:
    - destination:
        host: payment.robot-shop.svc.cluster.local
        subset: production
      weight: 99
    - destination:
        host: payment.robot-shop.svc.cluster.local
        subset: canary
      weight: 1

Using the “weight” field the requests are distributed across the subsets defined in the Destination Rule. The example above routes 1% to the canary (stage: test) Deployment subset.

Istio In Control with Instana Watching

 

errors after canary deployment
container comparison

Using the Virtual Service definition above with just 1 replica of each Deployment, no resource has been wasted and only 1% of the requests were exposed to any problems with the new release. With the prompt feedback that Instana provides as changes are deployed, the high error rate of the canary deployment is immediately visible enabling a quick decision to rollback that deployment. Because Instana traces every request, even if the canary deployment was only live for a few minutes, all those erroneous requests will be captured for post mortem analysis.

After a successful canary test, the deployment under test worked well, the Istio configuration is updated to route all requests to the new version. This can be done in one fell swoop or incrementally, double checking with Instana on each incremental step.

Using Istio to limit the number of requests exposed to new functionality alongside Instana providing immediate feedback, significantly reduces risk when deploying into production. Increasing the speed of your CI/CD cycle while maintaining service quality.

14 days, no credit card, full version

Free Trial

Sign up for our blog updates!
|
Category: Featured, Thought Leadership
Observability is a hot topic of debate at the moment, principally around the argument that observability is different than monitoring....
|
Category: Developer, Featured, Thought Leadership
Where are you with CI/CD? Chances are, you're probably asking yourself one of these questions:If you’re not already doing CI/CD,...
|
Category: Developer, Featured, Thought Leadership
“We deploy multiple times per day” is a new badge of honor at companies across the world. But what you...

Start your FREE TRIAL today!

Free Trial

About Instana

As the leading provider of Automatic Application Performance Monitoring (APM) solutions for microservices, Instana has developed the automatic monitoring and AI-based analysis DevOps needs to manage the performance of modern applications. Instana is the only APM solution that automatically discovers, maps and visualizes microservice applications without continuous additional engineering. Customers using Instana achieve operational excellence and deliver better software faster. Visit https://instana.com to learn more.