Scaling Microservices – Understanding and Implementing Cache

August 20, 2019

Scaling Microservices – Understanding and Implementing Cache

Caching has been around for decades, as accessing data quickly and efficiently is critical when building application services. Caching is a mandatory requirement for building scalable microservice applications. Therefore, we will be reviewing three approaches to caching in modern cloud native applications.

Image result for caching memeIn many use cases, cache is a relatively small data storage medium using fast and expensive technology. The primary benefit to using a cache is to reduce the time it takes to access frequently queried data on slower and cheaper large data stores. In modern applications there are a multitude of storage methods of caching data, but let’s briefly cover the two most popular approaches:

  1. Storing small amounts of frequently accessed data in local or shared memory (RAM)
  2. Storing larger amounts of data on fast or local disks (SSD)

Caching is considered an optimization along the z-axis of the AKF scaling cube, and while not exclusive, it is one of the most popular approaches to scaling on that axis.

Cache Storage

Let’s consider the first method, caching in memory. This approach was once fairly straightforward, as systems were large and many aspects of a computer program executed on a single machine. The many threads and processes could easily share RAM and cache frequently accessed data locally. As companies have begun transitioning to the cloud, microservices, and serverless, caching has become more of a challenge because service and functional replicas may not be running on the same host, let alone in the same data center.

Systems engineers have adapted to this. We have technologies such as Hazelcast that offer a shared caching model which transcends local restrictions through a secure networking model. There are also other technologies such as Redis, which offer a hash-based lookup system that can be run on RAM as well as fast disks (SSDs) for tiered cache.

The second storage medium to consider when caching are local or shared SSD systems which may be faster than older magnetic or tape medium. These systems are usually deployed when the content is much larger than RAM system can store. Typically, large images or video media are cached using these systems.

Cache Warming

The entire premise of caching is that disk look-ups are slow, especially for large databases which have many orders of magnitude of data that cannot economically be stored in fast memory (RAM). For example, a stock trading company may keep the most recent transactions in RAM, a process called “cache warming”. The engineers at this company know that their customers will be accessing this data frequently, so they push the latest transactions into the cache system as they occur. This is a more proactive approach than waiting for a user to access data before storing it in cache, which is the most popular method of caching and we’ll discuss that next.

Cache Hit or Miss?

Most cache implementations are a variation of the approach where data is accessed through normal methods, whether that be a database, storage bucket, or another implementation such as an API. Caching systems are an intermediary where responses are intercepted and stored in memory where they can be accessed again with much lower latency than their slower counterparts. When a request is made, the caching system checks to see if it has the appropriate response. If it does not then it is called a cache miss. In this case, the request will be passed along to the slower system to be fulfilled. The response will be stored in memory. Once this data has been stored in cache and is accessed again, it is called a cache hit.

When deciding what to cache, and when, it’s important to take into considering the latency of various actions on computer hardware and networking. There are some great resources available to help inform these decisions on the interwebz, but as a general rule of thumb you’ll want to cache anything that is accessed frequently, does not change often, and is on media that is slower than RAM.

Measuring Cache Effectiveness and Costs

Caching is always a good idea when hit rates are high, and the overhead costs outweigh the costs of lost revenue due to unhappy customers. Paying close attention to these costs used to be incredibly important because RAM was very expensive a decade or two ago – and the industry had complex formulas to determine budgets.

Today, RAM is cheap. Use cache whenever you’ve determined the cache look-ups and updates are fast or near zero and the performance benefit is a function of the cache hit ratio. Let’s assume an average lookup time of 10 seconds, and a hit ratio of 50%. With instant look-ups, the average would then fall by 50 percent to 5 seconds. Even with less than ideal scenarios (ie, the real world) we’re going to see a huge performance increase by implementing cache.

Where art thou Cache?

Before we wrap up this post, we’re going to cover cache placement. Now, there are several other topics of importance when considering cache that could take up an entire blog post, and if topics such as persistence, replacement algorithms, invalidation, and sizing are of interest I highly recommend picking up the book, The Practice of Cloud System Administration: Devops and SRE Practices to learn more.

cache locations

When implementing caching systems, you’ll want to consider where to place the caching components. Each approach has its own benefits along with their own downsides. Let’s analyze each approach and their use case, as well as their pros and cons

Client-side caching

This approach can be observed with most web browsers and has the benefit of reducing network latency and remote storage I/O.


  • Reduce load time for end users, this makes users happy
  • Utilizes local end user storage for cache, reducing costs for operators


  • Cache size is limited, browsers clients store finites amount of data
  • Cache invalidation is hard, browsers / clients may not honor the servers request for invalidation. This can lead to outdated images and scripts being served, which in turn leads to a poor user experience
  • Once cache is invalidated surges may happen in request volume which must be handled by the content provider

CDN – “Cache-in-the-middle”

This approach is used by web service providers who have larger payloads or must exert better control and geographic distribution over content, many of the benefits of local caching applies here without some of the drawbacks.


  • CDNs deploy large distributed fleets which are better suited to handle surge in demand, resilient against denial of service attacks
  • Reliable cache invalidation mechanisms
  • Handles large documents, video and audio files
  • Chances are they’re more reliable than you are


  • Requires engineering expertise to deploy and manage
  • Slower than local cache (network I/O)
  • Costs money

Server-side caching

We’ve covered this extensively, but this method is preferred for keeping transactional lookups fast and secure.


  • This is by far the most reliable and fastest method of caching available
  • Useful for high volume transactions that can be kept secure
  • Highest degree of control over invalidation


  • Caches become ineffective across data centers (use a CDN)
  • Heavy reliance may cause major downstream issues if cache systems suddenly become unavailable


As we wrap this post up, it’s important to mention that not all cache systems need be RAM based. Large files may be cached to disk rather than reading data from cold storage in another data center. Even our CPUs use high speed cache called L1 and L2, which reduce the burden on RAM since they are literally built into the CPU die.

It’s important to understand that without cache many of the other scaling approaches will not be nearly as successful for services that lookup and read data because of the overhead that is generated by I/O operations. Before you even begin to consider scaling out your services you should make sure that you have a solid caching solution in place.

In the next post, we’ll talk about how to monitor your microservice applications and identifying performance bottlenecks and opportunities for scaling and caching. The goal of observability in general is to get real feedback into your scaling efforts – because without monitoring and visibility you won’t know what to scale, nor will you have a benchmark to determine if your efforts were successful.

Play with Instana’s APM Observability Sandbox

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit