What is Envoy Proxy?
“Originally built at Lyft, Envoy is a high performance C++ distributed proxy designed for single services and applications, as well as a communication bus and “universal data plane” designed for large microservice “service mesh” architectures.” – https://www.envoyproxy.io
Essentially, Envoy was built to solve major problems that arise when you transition to microservices. Managing the network that interconnects all microservice and properly routing all requests through this highly dynamic architecture is difficult at best. I’m not talking about the layer that connects the individual hosts, I’m referring to the virtual network layer sitting on top of the physical network that needs to automatically adjust to microservices that can change at any given moment. This virtual layer needs to be capable of handling primary request routing, automatic retries, circuit breaking, global rate limiting, request shadowing, zone local load balancing, etc.
To quote the Envoy Proxy website again… “Envoy runs alongside every application and abstracts the network by providing common features in a platform-agnostic manner.”
How to Monitor Envoy Proxy
Monitoring Envoy Proxy should be thought about in two distinctly different ways. First, metrics and KPIs are important indicators to the overall health and performance of Envoy but they are not enough in and of themselves to completely understand what impact Envoy has on requests flowing through. Second, but arguably more important, distributed tracing of each request through Envoy will show, with absolute certainty, the impact of Envoy on each request.
Envoy Metrics: “Envoy outputs numerous statistics which depend on how the server is configured. They can be seen locally via the GET /stats command and are typically sent to a statsd cluster. The statistics that are output are documented in the relevant sections of the configuration guide. Some of the more important statistics that will almost always be used can be found in the following sections: HTTP connection manager, Upstream cluster”
If you look at the list of metrics in the links above you will see that they are quite extensive. This is good and bad. Having access to a large number of metrics gives a sense that you will have the data to figure out the root cause when problems arise but this is a false sense of security as I have experienced personally while sifting through charts for hours on end during an incident. It’s not fun, and often the answers are not in the charts.
Distributed Tracing: Envoy provides “deep observability of L7 traffic, native support for distributed tracing” through its usage of OpenTracing APIs. So does that mean that Envoy provides everything I need to automatically get distributed traces of all of my requests? No.
It’s great that Envoy provides OpenTracing compatibility out of the box but there is a lot of work left for the end user to create and manage the environment that will collect, store, and analyze the trace data. Also, the default tracing is not automatically connected to all of the downstream service calls or to any of the metrics and KPIs we discussed earlier. Without full correlation, we are left with disconnected pieces of a jigsaw puzzle that we must manually try to connect during an outage. This wastes valuable time during an outage and may even prevent us from uncovering the root cause.
How to Monitor Envoy Based Microservices with Instana
Instana automatically collects the Envoy OpenTracing data and correlates it to all downstream trace data generated by Instana Agents or other OpenTracing services.
We have a ready-made demo of Envoy with Docker Compose for you to try. The only requirements for you are a Docker Compose setup and an Instana tenant. (If you do not have an Instana tenant, you can get a trial one in few minutes, no strings attached).
The setup, as you’ll see in the demo referenced above is simple:
- Add the snippet below to the YAML file you pass to Envoy on startup as bootstrap configuration:
tracing: http: name: envoy.dynamic.ot config: library: /usr/local/lib/libinstana_sensor.so config: service: envoy-gateway
Only 1 Instana agent is required per Docker host. Every container on that host will send monitoring data through the single agent. This keeps overhead extremely low and greatly simplifies the overall deployment of Instana.
The benefit of using Instana is that each distributed trace is stitched together by Instana for a full, end to end view of every request passing through Envoy (Figure 1). With Instana there is no need to manually determine which envoy traces belong with which service traces. That is automatically taken care of by the Instana backend server and available for every request within a few seconds of request completion. There is no sampling of any kind ever, so you will always have complete data to identify the root cause of any problematic request.
Figure 1: Instana screenshot showing Envoy and microservices in a single distributed trace
Potentially more important than each individual request are the aggregate dashboards that are provided for each Envoy Proxy service (Figure 2). These dashboards analyze all of the trace data over the selected time period and show trends in health and performance. It’s easy to identify problems with the Envoy Proxy service at a glance using the various charts contained within each dashboard.
Figure 2: Envoy service dashboard from Instana showing aggregate trace data from Envoy
Just as with every other technology monitored by Instana, Envoy monitoring includes automatic and continuous discovery, dependency mapping, metric monitoring, distributed tracing, anomaly detection, and filter based analytics across the complete trace data set. You will know everything that Envoy Proxy is doing and the impact to user requests at all times. If you want deep visibility of your Envoy Proxies combined with distributed tracing of your microservices then you should sign of for a free trial of Instana and see for yourself.