Taking an Initial Approach to Scaling
Deciding when and how to approach scaling your service can be a daunting task, but it doesn’t have to be. In order to effectively scale, engineers must have a solid understanding of the behavior of the system, data to support their hypothesis, and a method to test the results. We’ll talk more about measuring the performance later in this series, but for now it’s important to understand that most effective scaling strategies involve “Just In Time” optimizations.
It’s critical to have an environment and culture which allows for production level testing via canary type deployments or synthetic testing infrastructure in place to allow for test engineers to develop load tests which closely emulate production workloads. If neither of those are available in your organization, you’re effectively shooting in the dark when it comes to scaling and optimization.
In the example below we traverse through 4 distinct phases of scaling, and the problems associated with each one. Later in the series we’ll discuss each one in detail, but it’s a good overview of the types of problems you’ll encounter when scaling distributed systems.
- In Service B we’ve experienced thread pool exhaustion because the combined traffic from A & C have overwhelmed the service. The natural response is to increase the thread count but this quickly causes issue.
- Once we’ve added more threads this creates additional load and connection exhaustion on the database. The initial response might be to utilize connection pooling but eventually even this will not be enough. We would then scale the database size vertically to accommodate the load
- Eventually we need to scale Service B more by adding more replicas, this creates even more load on the database, potentially to the point where a single instance can no longer sustain the load
- We may implement sharding, or utilize a distributed database solution such as Cassandra or Mongo. Either way, we’ve now implemented a solution which should enable significant horizontal scaling solutions.
Speculation Bad … Data Good
If we attempt to optimizing by speculation, without a clear understanding of the usage patterns which are causing performance bottlenecks, we run the risk of wasting our time solving problems which will never come to fruition. Distributed systems are simply too large and complex to rely on speculation when solving performance issues at scale.
The most important aspect of distributed systems is understanding that these systems will always have a constraint or bottleneck. The only reason this fact is troubling is when you don’t understand what those constraints are. If you are empowered to identify, perform analysis, form hypothesis, take measurements, test your theory, and analyze the results you’re in a great position to solve these complex issues.
Can’t we just make the server bigger?
The most straightforward methodology of scaling up is adding more CPU and/or memory. However, this approach is limited by the physical limitations of computer hardware. It’s further limited by software design as CPU scale isn’t a factor of speed but it’s rather a factor of compute units (or CPU cores). This strategy also becomes very costly and ineffective when auto-scaling based on periodic demands.
Optimize First, Scale Later
The initial approach to any scaling effort should always be optimization. For example, once more efficient algorithms have been put in place and/or cache layers implemented, then systems should be scaled horizontally – which includes replication, sharding and distribution. All of these approaches require some level of refactoring, some are more complex than others. We’ll discuss these approaches in a later post but for now understand if you scale your services without optimizing them first all you’ll be doing is effectively replicating and compounding your performance bottlenecks.