Scaling Microservices: Advanced Approaches with the AKF Scaling Cube

June 21, 2019

Scaling Microservices: Advanced Approaches with the AKF Scaling Cube

Scaling Up: Advanced Approaches

Once you’ve optimized your services to utilize the resources which are available for a single process or node you must now consider how to best approach your scaling efforts. None of the methods are trivial and we will not delve se into the endless discussion as to which is the best. The approach you ultimately take will be based on your current architecture, constraints, and resource availability.

We will be distilling three different approaches based on the AKF scaling cube: duplicate your entire system (horizontal replication); decompose your system into individual functions, services, or resources (service and functional splits); and split your system into individual pieces (lookup or formulaic splits).

The AKF scaling cube was conceptualized by Abbott, Keeven, and Fisher and it visualizes these categories as x-, y-, and z-axis.

x: Horizontal Duplication

We’ll cover the x-axis first, or horizontal duplication, which is the replication we’ve touched on briefly in the previous blog posts. This method of scaling involves creating many replicas of an individual service to provide additional processing power. An example of this method of scaling is creating additional workers to complete a task or job.

This method of scaling is reduced in effectiveness by complex data or transactional systems. As long as every request can be completed independently across all replicas. Modern cloud native architecture and twelve-factor app design encourages building services in this manner due not only to the scalability aspects, but the reliability of this approach as well.

The CAP theorem tells us that when building our systems, we must sacrifice one of the following principles when implementing distributed applications: consistency, availability, or partition tolerance. If our applications must remain consistent when writing data, then all available replicas will need to wait before writing their data. In most cases, you will want to rely upon external systems to handle data persistence because they have built solutions which handle these issues pragmatically.

y: Functional or Service Splits

With this approach, we’re allocating specific resources and capacity to individual functions or “domains” so it can have resources dedicated for that task or service. The initial approach to this type of split in a traditional SOA is to make sure our database, web server, and application servers are provisioned on their own dedicated systems. This method of scaling has its own limitations, which is why we are now in the microservice era. Nevertheless, the same principles apply, a database server is going to consume hardware resources in a much different manner than a web application server.

With service splits, we are taking a more orthogonal approach to our architecture. A straightforward example of this is to separate the transactional portion of a web application from the reporting functionality. This split allows us to keep resources dedicated to serving live users while a separate process or service handles analytics and reporting.

Similar to the benefits gained by splitting our web application server and the database, this type of scaling also gives us the benefit of allocating resources that are more suited for the workload. For instance, a transaction system may require substantially more CPU and network I/O than the reporting sub-system which requires additional disk I/O. By segregating these systems, we can tailor the hardware to the task and ultimately save money by not allocating unnecessary resources.

This method of scaling is what you’ll focus most of your time on unless you start moving into huge web applications that handle upwards of several million transactions per day or heavy data analytics. Here are just a few of the approaches you may encounter when building services using this approach:

  • Splitting by functionality, with each “function” dedicated its own resources (web server, db server, cache server, etc).
  • Splitting by service or “domain”, with each service on its own pool of resources
  • Splitting by transaction type
  • Splitting by user (or tenant)

z: Lookup-oriented split

Lookup-oriented splits up a system by segmenting the data into chunks or segments, those segments are then given dedicated resources. The necessity of z-axis splits come to bear when the data sets which are being handled by services becomes too large for a single instance – this method of scaling is often referred to as “sharding”.

A common method of performing this type of split is by dividing a table by the auto-incrementing id field, this can be done by programmatically by virtue of assigning a particular record a database shard. You can read more about this approach on how Pinterest scales their database on their engineering blog post. This approach suites very large systems which must support many terabytes of transactional data – there are far more pragmatic approaches for less data intensive applications.

More practical approaches may include segmenting your data by date, customer, or region. Each approach has its advantages, for example:

  • Date: Each year could have its own database/machine, with non-current years being allocated less resources and optimized for read-only access.
  • Customer: Each customer is different, and one may require more resources than another, by allocating a database per customer we can assign capacity in a more structured manner.
  • Region: Regional databases ensure access remains available in the event of an outage in another data center, this is allowing allocated capacity based on each region’s usage. For instance, we may reduce capacity during the evening in the US but in APAC we need to ramp up capacity.

Regardless of the method you choose to split your data will require significant refactoring of your applications. Since this approach requires significant amounts of time and engineering resources great care should be taken to decide on which approach you ultimately take. Since scaling on the z-axis is considered the most difficult it is typically only done when the x- and y- axis have been exhausted.


Scaling applications, whether they be microservices or monoliths, can be achieved through practical approaches to architecture, design, and implementation.Each approach requires significant investment for proper research and development. There is no magic bullet, but there is a method which can be used to great effect without a tremendous amount of refactoring or rewriting in the event your services weren’t designed with regards to scaling. That approach will be covered in-depth in our next article about caching. Stay tuned.

Play with Instana’s APM Observability Sandbox

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit