Scalability#

Understanding system scalability is crucial for building robust applications. This post explores the qualitative and quantitative approach to system. Before diving into the mathematical aspects of system scaling, let’s understand the two fundamental approaches to scaling: vertical and horizontal.

Understanding Scaling Approaches#

Vertical Scaling (Scaling Up)#

Vertical scaling involves adding more power to your existing infrastructure. Think of it as upgrading your machine’s capabilities:

Resources Enhanced:
- CPU: Adding more processors
- Memory: Increasing RAM capacity
- Storage: Expanding throughput (MB/s) and performance (iops)
- Network: Increasing bandwidth (Gb/s)
Advantages:
- Simple to implement using CSP APIs.
- No major application architecture changes required. Some applications may require reconfiguration for optimal usage of resources.
- Lower complexity in terms of data consistency.
- Reduced latency as all communication happens using inter-process communication mechanisms.
Limitations:
- Hardware limits. There is a limit to VM image sizes both in terms of absolute limit and performance enhancement.
- Potential downtime during upgrades. Upgrading a VM requires a reboot.
- Single point of failure risk although a multi-node high-availability architecture eliminates this risk.

Horizontal Scaling (Scaling Out)#

Horizontal scaling involves adding more machines to your resource pool:

Advantages:
- Theoretically unlimited scaling potential
- Better fault tolerance and reliability
- Cost-effective at large scale
- Easier to upgrade without downtime
Challenges:
- More VMs to monitor, upgrade, maintain and keep secure
- More expensive due to per-VM licensing of some components
- Data consistency considerations for stateful applications
- Network overhead

Choosing the Right Approach#

The decision between vertical and horizontal scaling depends on several factors:

Application Architecture:
- Stateless applications → Easier to scale horizontally
- Stateful applications → May prefer vertical scaling initially
Cost Considerations:
- Budget constraints
- Operating expenses vs. capital expenses
Performance Requirements:
- Response time requirements
- Throughput needs
- Availability requirements
Growth Patterns:
- Predictable vs. unpredictable growth
- Peak load characteristics

A Quantitative Approach to Performance Engineering#

The mathematical foundations of scalability through queuing theory and provides practical guidelines for system dimensioning.

Queuing theory helps understand the relationship between concurrency and system dimensioning.

Let r = request rate in queries/second, and t = processing time in seconds

The system load is defined as

$$ L = r\cdot t $$

To take a numerical example, for a fixed processing time $$ t=10 ms = 10^{-2} s$$ increasing request rates would produce the following load:

$$ r=10,000 \implies L=10^4 \times 10^{-2} = 100$$

$$ r=1,000 \implies L=10^3 \times 10^{-2} = 10$$

$$ r=100 \implies L=10^2 \times 10^{-2} = 1$$

Queuing theory tells us that processing follows a normal distribution. So if each node needs an extra spare capacity of one standard deviation: $$ \sigma = \sqrt L $$ it will require a relative spare capacity of $$ \delta = \frac{\sqrt{L}}{L}$$

$$ L=100 \implies \delta=\frac{\sqrt{100}}{100} = 0.1 = 10%$$

$$ L=10 \implies \delta=\frac{\sqrt{10}}{100} = 0.316 = 31.6%$$

$$ L=1 \implies \delta=\frac{\sqrt{1}}{100} = 1 = 100%$$

A lightly loaded system needs 100% extra capacity just because of the stochastic nature of traffic. So, the lower the concurrency or intensity on a system, the more capacity needs to be reserved for statistical spikes. In other words, larger systems handling more concurrent load can be kept at a higher average load.

Shared resources and over-commit#

Virtual machines contend for the resources of the underlying hardware. Some resources are strictly dedicated to a VM while others are shared among all VMs. This table describes the shared components and how they affect an individual VM.

Resource	Sharing	Description
CPU	Dedicated	CPU cores are generally dedicated to a VM. Shared core instances should never be used in production unless they are dimensioned for the guaranteed capacity. This is usually hard to do and leads to surprises
CPU L3 Cache	Shared	L3 cache is shared among all VMs sharing the physical CPU. The more VMs there is, the higher probability of contention.
Memory	Dedicated	Memory is usually dedicated in cloud instances. In some on-prem systems, memory is over-committed and some VMs may experience an out-of-memory condition.
Memory bandwidth	Shared	It is practically impossible to dedicate bandwidth per core.
Disk IOPS	Dedicated/Shared	Some disk systems can guarantee IOPS while others have an indicative best-effort value.
Disk bandwidth	Dedicated/Shared	Some systems dedicate disk bandwidth per VM, while other have it shared with other VMs.
Network bandwidth	Dedicated/Shared	Some systems can guarantee bandwidth for a VMs, while other provide fair access to bandwidth.

The conclusion from the above observations are that the larger the VMs, the less probability there is another VM contending for resources. Using dedicated VMs, if at all possible, guarantees performance and lowers the overall cost.

Another point to consider is that if we perform a load test on a system, there is a chance that there is a neighboring VM thar is contending for the shared resources. Multiple runs during different periods would need to be performed to guarantee a proper reading of the test results.

Conclusion#

By understanding both scaling approaches and their mathematical foundations, organizations can make informed decisions about their scaling strategy and implement it effectively.

The rules to optimize resource utilization while scaling are:

S1. Start with the smallest size system that provides the necessary availability

S2. Ensure proper system observability and monitor load

S3. Scale system layers vertically when load reaches 70%

S4. Scale horizontally when vertical scaling limit is reached

S5. Always perform load tests on fully loaded servers by using a dedicated host and using it at full capacity.