What is Canary Testing?
Miners took a caged canary down the mine with them to detect carbon monoxide and other noxiuous gasses. When the canary stopped singing it was time for the miners to leave before they also stopped breathing. This kept the miners safe but it did not always end well for the canary. Today electronics have taken the place of the canaries.
The same idea of sacrificial testing can also be used for evaluating new features or fixes in microservice applications. Anyone who has been involved with software development knows that there’s nothing like giving code to users to make it break. With canary testing, a very small percentage of requests are served by the new code, if the code does not work as expected, no canaries die and only a tiny number of users have been impacted.
What is Istio?
Istio is a service mesh for Kubernetes. So what’s a service mesh?
A service mesh provides discovery, load balancing, failure recovery, metrics and monitoring, A/B testing, canary testing, rate limiting, access control, and end to end authentication. Phew, that’s a lot! This article will not cover all use cases of Istio but instead focus on canary testing; these techniques can also be used for A/B testing.
Why use a service mesh? You can do canary testing using just the capabilities of Kubernetes. A Service is mapped to Pod(s) via the labels.
apiVersion: v1 kind: Service metadata: name: cart spec: … selector: service: cart apiVersion: apps/v1 kind: Deployment metadata: name: cart labels: service: cart spec: replicas: 9 … template: … spec: containers: image: robotshop/rs-cart:1.0.0
The selector maps the Service to any Pods with the matching label. Therefore if a new Deployment is created.
apiVersion: apps/v1 kind: Deployment metadata: name: cart-test labels: service: cart spec: replicas: 1 … template: … spec: containers: image: robotshop/rs-cart:2.0.0
Notice that this new Deployment is using a different version of the container image and has a different number of replicas. The kube-proxy process running on every node provides load balancing across all matching Pods. This example results in 10% of requests going to version 2.0.0 and 90% to the original version 1.0.0. So far so good. However, if the new release has problems, it could result in a loss of 10% in revenue, which is too high. To reduce the risk exposure down to a more acceptable 0.1% the replica ratio would have to be 999:1. This requires a large number of extra Pods running, all consuming resources. Not an efficient use of resources, this is where Istio, a service mesh, comes to the rescue.
The environment used for the screenshots consists of:
- Kubernetes on Google Cloud Platform 1.13.7-gke.24
- Helm 2.14.1
- Istio 1.2.5
- Stan’s Robot Shop
- Instana account, sign up for a free trial.
Create a Kubernetes cluster with 3 nodes of type n1-standard-4. Do not select the “Enable Istio (beta)” checkbox, Install it with Helm following the Istio documentation.
Istio uses a sidecar container running Envoy on each Pod to manage the traffic. This sidecar can be automatically injected by Istio when the Pod is created. Create the Namespace for Stan’s Robot Shop and enabled automatic sidecar injection.
$ kubectl create ns robot-shop $ kubectl label ns robot-shop istio-injection=enabled
Finally deploy Stan’s Robot Shop in to the Namespace then install the Instana agent using its Helm chart.
By default the “web” Service of Stan’s Robot Shop is configured with a type of LoadBalancer. This should be changed to ClusterIP when running with Istio because all traffic should go via Istio’s ingress control. Create the Istio Gateway and VirtualService for Stan’s Robot Shop.
$ kubectl apply -f K8s/Istio/gateway.yaml
Now the shop front is available via the Istio Ingress Gateway.
Having a Canary
With the load generation script from Stan’s Robot Shop providing some traffic, the experimentation can start. Initially a new Deployment for the new version of the payment service is created, without any extra Istio configuration.
The new Deployment (flagged on the timeline) is matched by the Service labels and Kubernetes automatically starts load balancing across both the new and old Deployments. Instana traces every request, provides 1 second metric granularity and with only a few seconds latency, giving immediate feedback that the new Deployment is causing problems. The replica ratio for the Deployments is 1:1 resulting in up to 33% of the requests getting errors. If this was a real application that could be a 33% loss of revenue. Fortunately due to Instana’s immediate feedback, the decision to rollback the new deployment can be made quickly before too much damage is done to the bank balance.
Istio provides much finer control over the level of exposure given to a new canary test, greatly reducing the risk while still providing a means to test a new release in production. First the Deployments need some extra labels.
… kind: Deployment metadata: name: payment-fix labels: service: payment stage: test … kind: Deployment metadata: name: payment labels: service: payment stage: prod
The extra label “stage” is then used to match in an Istio Destination Rule which also references the payment Service, this creates the subsets.
apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: canary-test spec: host: payment.robot-shop.svc.cluster.local subsets: - name: production labels: stage: prod - name: canary labels: stage: test
Finally the Istio Virtual Service uses the Destination Rule subsets and provides fine control over the distribution of requests.
apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: robotshop-canary spec: hosts: - payment.robot-shop.svc.cluster.local http: - route: - destination: host: payment.robot-shop.svc.cluster.local subset: production weight: 99 - destination: host: payment.robot-shop.svc.cluster.local subset: canary weight: 1
Using the “weight” field the requests are distributed across the subsets defined in the Destination Rule. The example above routes 1% to the canary (stage: test) Deployment subset.
Istio In Control with Instana Watching
Using the Virtual Service definition above with just 1 replica of each Deployment, no resource has been wasted and only 1% of the requests were exposed to any problems with the new release. With the prompt feedback that Instana provides as changes are deployed, the high error rate of the canary deployment is immediately visible enabling a quick decision to rollback that deployment. Because Instana traces every request, even if the canary deployment was only live for a few minutes, all those erroneous requests will be captured for post mortem analysis.
After a successful canary test, the deployment under test worked well, the Istio configuration is updated to route all requests to the new version. This can be done in one fell swoop or incrementally, double checking with Instana on each incremental step.
Using Istio to limit the number of requests exposed to new functionality alongside Instana providing immediate feedback, significantly reduces risk when deploying into production. Increasing the speed of your CI/CD cycle while maintaining service quality.