How Developers Can Tame Microservices Sprawl

Post

Introduction

Microservice architectures have the simultaneous challenge of information hiding and information discovery. Irrelevant services are hidden to make the system understandable by focusing on what is important and ignoring everything else. However, the system is constantly changing so you need to see inter-service dependencies to understand what is relevant. Isn’t this an information hiding puzzle? Application Perspectives (AP) is a capability that solves this puzzle because it enables you to dynamically scope the visibility to “just the right” size to meet your needs, such as:

  • by zone or cluster;
  • by technology;
  • by business transaction or user journey;
  • by deployment engine;
  • by version or release;
  • or any combination.

This “divide and conquer” algorithm effortlessly shrinks microservice sprawl into a perfectly scoped perspective using automatic discovery, automatic instrumentation, automatic tracing, and automatic dashboard creation. The first article in this series introduced Application Perspectives. This second article builds on that to further explain AP definition.

The Call as a Building Block

As discussed in the prior article, there are four components to an Application Perspective: the Application Perspective (AP) itself, services, endpoints, and calls. The most important of these is the call because it is the building block to form the other components: one or more call(s) form an endpoint; endpoint(s) form a service; and services form an AP. As expected, a call represents the message sent from a source to a destination. That call object is annotated with meta-data (i.e., tags) that is used while matching a query; if the call meta-data matches the query then the call and its endpoint(s) and service belong in the AP.

Application Perspectives Developer Overview

How are calls found? Instana’s AutoTrace  instrumentation records distributed traces automatically, including the sent and received request information used to construct the call. The trace spans are further processed into one, or more, calls. Additional infrastructure meta-data is added to the call from both the source process and destination process of a request.

How are endpoints formed? An endpoint is an operation exposed by a service (e.g., GET http://stansrobots.com/api/payment/{user-id}). It is formed by applying pattern matching techniques to one, or more, calls. An endpoint is automatically assigned a commonsense name according to predetermined rules. Of course, custom names for endpoints can be specified.

How are services formed? An Instana service is what you would expect it to be when talking about a microservice architecture: it is code that is deployed with endpoints. It is rare that you need to remember the details that a service is a logical grouping of related endpoints and calls. It is constructed using an aggregation technique called service mapping. Services are intuitively named using a set of well documented service mapping rules. Of course, a user can create a custom service mapping rule.

Defining an AP by Choosing Tags

An AP is defined using a declarative query which matches call meta-data called tags. An AP is automatically kept up to date because the query is continuously evaluated for each input trace. So, if a new container receives a request that is captured in a trace, that request is converted to a call, the call’s tags are matched against an AP query, and both the endpoint and its service is automatically included in that AP when there is a match. For example, the tag/value pair technology=’springbootApplicationContainer’ defines an AP that includes all Spring Boot applications where each application is a service. If a new Spring Boot application ‘Bar’ is deployed, then this is modeled as a new service ‘Bar’ which is automatically added to the AP when Bar receives its first request. So selecting the right tags is key to building a meaningful AP query.

Tag selection is aided by intellisense capabilities for tag names and tag values. Tags follow a naming convention like the dot convention of a Java package or a domain name. For example, the first part of the name is the most general (e.g., ‘call’), then another part to narrow the scope some more (e.g., call.http), followed by the actual item (e.g., call.http.header). Finding a tag is relatively easy because you can provide any part of the tag name and a list of candidate tags is recommended for you to pick from (e.g., enter ‘http’ or ‘header’). After selecting the tag name, a list of candidate values is shown to select from. It is quite popular to add user defined tag/value pairs using the Instana SDK and this same intellisense works for them.

When choosing a tag, you may need to specify which entity it applies to. There are three possibilities. The first is that Source Destination Tag Independent because it’s an attribute of the call (e.g., call.http.params). This is shown as Source Destination Icon. Alternatively, a tag can apply to either the Source Icon or Destination Icon with the default being Destination. An example where the source or destination is important is the service.name because it identifies if that is the source or destination service.

Using the Same AP for Monitoring or Troubleshooting

There are two principal views each AP has which addresses two different use cases. The first is an opaque, monitoring view for a group of services or even just one service. This opaque view is helpful for monitoring if clients are experiencing issues. The second view is an end-to-end flow of the calls which is discovered at run time.

These two views are controlled using two sets of options: the Downstream Services option and the Dashboard View option. Downstream services determine what additional data is collected. The Downstream Services option is usually specified when defining the AP. Dashboard View determines the default dashboard view. It easily transitions from the monitoring view to the end-to-end view, and vice-versa, Dashboard View can be changed at any time using a menu.

Monitoring for Client Problems
If you are interested in monitoring for client problems, then set Downstream Services to the value “No downstream services” and “Inbound Calls” is set to the ‘Dashboard View’ value. Then, the following AP query

AP Declarative Query 1

results in the following (unsurprising) Dependency Graph with the specified, single service called shipping.

Application Perspective Shipping Service

The automatically constructed dashboard Summary tab shows the metrics for that single service. These metrics correspond to a client’s experience for its shipping requests.

Application Perspective Edge Service Dashboard

Troubleshooting a Client Problem using the End-to-End Flow
If you are interested in diagnosing a client problem, then set Downstream Services to be “All downstream services”. Downstream Services is helpful because it effortlessly and automatically includes all the services you are interested in. A technical description is that it: transitively finds and includes all services that are downstream of those services that originally matched the AP query. This is best explained by a medical analogy. An angiogram is a medical procedure where a radioactive dye is injected into the blood steam and that dye flows through the arteries where an x-ray records the flow to check for blockages. By analogy, Downstream Services detects the entire end-to-end request flow(s) by injecting a software dye (the distributed trace context) that marks all the services involved whether they are specified by the query or discovered by the injected software dye.

The Dashboard View is set to All Calls Icon. The “All Calls” option expands the dashboard metrics to include all the calls, endpoints, and their services found in the distributed traces. With these settings, the AP is no longer just the lone shipping service. It is now the set of services that shipping calls, the services that those services depend upon, etc. This is shown below where there are three additional services that complete the distributed trace.

Application Perspective Dependent Service Map

The dashboard Summary metrics include these additional services and aggregate them together. This is seen below where the Total Calls count increased from 44K calls in the “Inbound” view to 172K calls for the “All Calls” view because the downstream calls are included.


Application Perspective End To End Service Dashboard

The troubleshooting process proceeds using the “divide and conquer” algorithm — drilling down from the summary tab to more detailed tabs, like the services table in the Services tab; the Dependency Graph that highlights services with issues; the error or log tab; etc. This often leads you to a distributed trace timeline (shown below) to identify the root cause in the code so that you can quickly fix the problem.

Distributed Trace Call Span Details

A View for Insightful Development

There is a third view available if you want to understand what is happening with a few services and their associated datastores. This is helpful when you need something in between the opaque and end-to-end views, like when you are developing code for services with their own databases. For this third view, set the Dashboard View to All Calls Icon 2 . Also set the Downstream Service’s value to “Immediate downstream database and messaging services” which needs a little explanation: the AP query determines the core set of services that match it and then this core set is expanded to include the database and messaging services the core set directly interacts with. An example is shown below where the ‘cities’ (database) service is automatically added to the AP because it is the directly used by the ‘shipping’ service.

Dependent Services View

Now you can build your own personalized, development dashboard to highlight errors or find key distributed traces. The first step is to install the Instana agent in your development environment with its AGENT_ZONE set. After you have sent some requests, confirm that monitoring is configured properly by seeing your services in Instana’s Services List. Then create your very personal, development Application Perspective by: (i) forming an AP query using the service.name and agent.zone tags, (ii) setting Downstream Services to the “Immediate downstream database and messaging services” value (more details further below about how to do this). Voila. You now have a dashboard, analytics running in the background, and distributed traces to review.

Example APs for Developers

Hiding infrastructure details and treating services as logical entities improves the signal-to-noise ratio. This section provides several example APs to do just that.

Specifying a Group of Services
A team of people may be responsible for several related services or web sites. In this case, the services are well known, fairly static, and the primary focus. It is quite easy to construct an AP query for those services.

The AP query below does exactly that and highlights some of the nuances of AP construction. First the evaluation of the Boolean expressions gives priority to evaluating AND expressions before OR expressions. This can be viewed as implicitly adding brackets around the AND expressions so they are evaluated first, resulting in the effective query of:(service name == cart AND type == HTTP) OR (service name == shipping) or (service name == ratings AND type == HTTP).

Secondly, it may be required to specify both the service name and service type together. A service is a logical entity which has its name automatically created so there can be rare instances of a service name clash — additional information is needed to remove the clash. In the example below, there is an HTTP cart service which needs to be differentiated from a database service that has the same name. That is why the call.type is set to ‘HTTP’.

AP Declarative Query 3

Kubernetes Application by Version or Release
A Kubernetes application can have an AP created for it which is scoped by version. The example AP below is for the Kubernetes application named “MyApp” for version 1.0.1. The standard Kubernetes labels are used.

AP Declarative Query 2

Monitoring a Technology by an SME
An AP can be constructed for a Community of Practice. This could be technology based, such as all database administrators monitoring all databases from one dashboard. The query below creates an AP to monitor database calls for mongoDB. Similar APs can be created for the other technology types, such as: Kubernetes, Rabbit MQ, ElasticSearch, AWS lambda, Jenkins, MySQL, etc.

AP Declarative Query 4

Picking an Environment
When the same service is running in different environments, such as PRODUCTION or STAGING, it may be desirable to create different APs for them. This can be done in several ways but the most common approach is to use the Instana data collection agent’s information. This is because different environments typically run on separate infrastructure which is often captured by the Instana agent. In this case, the environment is specified by the agent’s INSTANA_ZONE environment variable or its configuration file using the com.instana.plugin.generic.hardware label. This data is available as the agent.zone tag. So, an example AP query using the agent zone to scope the perspective to the production environment is shown below.

AP Declarative Query 5

Adding Your Own Custom Tags
As previously mentioned, custom tags can also be added by the developer using the Instana SDK which is available for many languages. Two short examples for PHP and Python are:

PHP
$entrySpan = \Instana\Tracer::getEntrySpan();
$span->annotate('account', 'Universal Sprockets');
$span->annotate('customer', '12345678');

PYTHON
import opentracing
opentracing.global_tracer().active_span.set_tag('account', 'Universal Sprockets')
opentracing.global_tracer().active_span.set_tag('customer', '12345678')

Both implementations create two new custom tags: one for the customer account and one for the user ID. It is assumed the account and ID information is available in an internal database and added via the SDK to the distributed trace. The AP query uses the call.tag tag which, when provided a key, has a value selected to complete the AP query.

AP Configuration Example

In this example, the call.tag.account key, along with the value “Universal Sprockets”, defines an AP which auto-generates a dashboard for the Universal Sprockets customer.

AP Declarative Query 6

It is also possible to create an AP for an individual user in this fashion. This can be used to resolve intermittent problems for a specific customer which, in this example, has the ID ‘12345678’. In this situation a temporary AP is defined for that specific customer which captures all the traces related for that customer to investigate the intermittent issue. This type of information is useful to the support and development teams.

Summary

The Application Perspective concept is unique to Instana and is a key enabler for cutting through the noise of any application environment. The uses, structure, and definition of an Application Perspective have been explained so you are equipped to construct an AP for your purposes. As shown, several example APs for use by developers can be helpful for dynamically shrinking the scope to focus the team. The next article in the series will discuss how APs can mix service and infrastructure information to define a mixed monitoring scope using logical and physical meta-data. New types of services are presented too. This is useful information for developers, SREs, DevOps, and IT operations.

Play with Instana’s APM Observability Sandbox

Customer Stories, Featured
This guest blog post was written by Dean Record, Engineer at Goji Investments. Goji Investments launched in 2016. Our platform democratises access to real estate, business lending, renewables, and other alternative investments....
|
Developer, Thought Leadership
We've been writing a series of posts about Building Applications for Resiliency. In the last post of the series, we talked about Circuit Breakers, retrying and timeouts. Using those patterns, we’re already...
|
Conceptual, Thought Leadership
A microservice architecture is flexible and dynamic but has the challenge of increasing complexity. For example, the picture below is an actual environment where hundreds of services collaborate with each other (a...
|

Start your FREE TRIAL today!

As the leading provider of Automatic Application Performance Monitoring (APM) solutions for microservices, Instana has developed the automatic monitoring and AI-based analysis DevOps needs to manage the performance of modern applications. Instana is the only APM solution that automatically discovers, maps and visualizes microservice applications without continuous additional engineering. Customers using Instana achieve operational excellence and deliver better software faster. Visit https://www.instana.com to learn more.