What is NGINX Proxy?
“NGINX [engine x] is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server, originally written by Igor Sysoev.” – http://nginx.org/en/
NGINX was originally built as an HTTP server designed for servicing a massive number of requests for companies like Yandex, Mail.ru, Rambler, Dropbox, Netflix, WordPress.com, and FastMail.FM. According to a study conducted by Netcraft, NGINX served or proxied 26.13% of the busiest websites in February 2019.
The proxy capabilities of NGINX are designed to solve major problems that arise when you transition to microservices. Managing the network that interconnects all microservice and properly routing all requests through this highly dynamic architecture is difficult at best. I’m not talking about the layer that connects the individual hosts, I’m referring to the virtual network layer sitting on top of the physical network that needs to automatically adjust to microservices that can change at any given moment. This virtual layer needs to be capable of handling primary request routing, automatic retries, circuit breaking, global rate limiting, request shadowing, zone local load balancing, etc.
There are 2 distinctly different versions of NGINX, the open source version and the commercial version named NGINX Plus. NGINX Plus has the following features designed for microservices:
- Service discovery – Automatically detect new services with DNS service discovery integration
- End-to-end encryption – Mitigate security risks with secure communications between microservices and SSL offloading
- Layer 7 routing – Easily direct clients to the correct service using URI-based content routing
- Load balancing – Safely scale out your app using advanced load balancing with active health checks
- Caching – Improve performance while reducing load on your app with flexible content caching
- Flexible deployments – Run in containers, on your existing hardware, or in the cloud with flexible software
How to Monitor NGINX Proxy
Monitoring NGINX Proxy should be thought about in two distinctly different ways. First, metrics and KPIs are important indicators to the overall health and performance of NGINX but they are not enough in and of themselves to completely understand what impact NGINX has on requests flowing through. Second, but arguably more important, distributed tracing of each request through NGINX will show, with absolute certainty, the impact of NGINX on each request.
NGINX Metrics: The NGINX website has a very lengthy section on how to collect and view NGINX metrics. If you look at the list of metrics in the link above you will see that they are quite extensive. This is good and bad. Having access to a large number of metrics gives a sense that you will have the data to figure out the root cause when problems arise but this is a false sense of security as I have experienced personally while sifting through charts for hours on end during an incident. It’s not fun, and often the answers are not in the charts.
NGINX also offers a paid (for anything over 5 servers) metric monitoring solution called NGINX Amplify. This solution is a valuable step up from basic metric monitoring but it’s missing a critical capability … tracing.
Distributed Tracing: NGINX offers a module that creates OpenTracing compliant data. This is useful but it requires that you use a monitoring backend that supports OpenTracing like Jaeger or Instana. So does that mean that NGINX provides everything I need to automatically get distributed traces of all of my requests? No.
It’s great that NGINX provides OpenTracing compatibility out of the box but there is a lot of work left for the end user to create an environment that will collect, store, and analyze the trace data. Also, the default tracing is not automatically connected to all of the downstream service calls or to any of the metrics and KPIs we discussed earlier. Without full correlation, we are left with disconnected pieces of a jigsaw puzzle that we must manually try to connect during an outage. This wastes valuable time during an outage and may even prevent us from uncovering the root cause.
How to Monitor NGINX Based Microservices with Instana
Instana automatically collects NGINX metrics AND the NGINX OpenTracing data and correlates it to all downstream trace data generated by Instana Agents or other OpenTracing services. With Instana, collecting metrics is really easy. All you need to do is install the Instana Agent to the host or container and enable the proper metrics module depending on which version of NGINX you use.
To enable NGINX metric monitoring, you need to enable the ngx_http_stub_status_module.
To enable NGINX plus metric monitoring, you need to enable the ngx_http_api_module.
That’s everything required for Instana to begin NGINX metrics collection and analysis. It takes maybe 5 minutes total time.
For NGINX tracing, we have a ready-made demo with Docker Compose for you. The only requirements for you are a Docker Compose setup and an Instana tenant. (And if you do not have an Instana tenant, you can get a trial one in a few minutes, no strings attached).
Only 1 Instana agent is required per Docker host. Every container on that host will send monitoring data through the single agent. This keeps overhead extremely low and greatly simplifies the overall deployment of Instana. The benefit of using Instana is that each distributed trace is stitched together by Instana for a full, end to end view of every request passing through NGINX (Figure 1). With Instana there is no need to manually determine which NGINX traces belong with which service traces. That is automatically taken care of by the Instana backend server and available for every request within a few seconds of request completion. There is no sampling of any kind ever, so you will always have complete data to identify the root cause of any problematic request.
Figure 1: Instana screenshot showing NGINX and microservices in a single distributed trace
In addition to distributed tracing, all of the NGINX metrics are automatically analyzed and correlated to the NGINX service components (Figure 2). If any requests slow down passing through the NGINX proxy, you will definitely know it and you will also have all supporting data (like metric KPIs) instantly available to help you understand the root cause.
Figure 2: NGINX metrics correlated to the process, container, and host
Potentially more important than each individual request are the aggregate dashboards that are provided for each NGINX Proxy service (Figure 3). These dashboards analyze all of the trace data over the selected time period and show trends in health and performance. It’s easy to identify problems with the NGINX Proxy service at a glance using the various charts contained within each dashboard.
Figure 3: NGINX service dashboard from Instana showing aggregate trace data from NGINX
Just as with every other technology monitored by Instana, NGINX monitoring includes automatic and continuous discovery, dependency mapping, metric monitoring, distributed tracing, anomaly detection, and filter based analytics across the complete trace data set. You will know everything that NGINX Proxy is doing and the impact to user requests at all times. If you want deep visibility of your NGINX Proxies combined with distributed tracing of your microservices then you should sign up for a free trial of Instana and see for yourself.