Scalability is the system’s ability to handle increasing load — users, requests, data, or traffic — by upgrading resources or infrastructure without degrading performance.
🔍 Two Key Types of Scalability
| Type | Description | Example |
|---|---|---|
| Vertical Scaling (Scale Up) | Add more power (CPU, RAM, etc.) to a single machine. | Upgrading from a 4-core to a 16-core server. |
| Horizontal Scaling (Scale Out) | Add more machines to handle the load in parallel. | Adding more servers behind a load balancer. |
🧠 Why It Matters
- A non-scalable system fails or slows down as demand grows.
- A scalable system maintains or improves performance with growth.
⚙️ What Needs to Scale?
- Traffic (requests/sec) → Need load balancers, CDN.
- Data size → Need sharding, distributed databases.
- Users → Need session management, stateless services.
- Compute workload → Need worker queues, autoscaling.
🚀 How to Design for Scalability
- Stateless services → Easier to scale horizontally.
- Database sharding/partitioning → Avoid single-node bottlenecks.
- Caching → Reduce database load (Redis, Memcached).
- Async processing → Use queues (e.g., RabbitMQ, Kafka) for heavy tasks.
- CDNs → Offload static content from your servers.
- Auto-scaling infrastructure → Dynamically adjust based on load (e.g., AWS EC2 Auto Scaling).
📉 Scalability Trade-offs
- Cost → Scaling can be expensive.
- Complexity → More moving parts, harder debugging.
- Consistency → More scaling often means sacrificing strong consistency (CAP theorem).
🧪 Example
If your app handles:
- 100 users/day with 1 server – fine.
- 10,000 users/day – that one server might choke.
A scalable design will let you:
- Add more servers automatically (horizontal scaling).
- Cache responses (Redis).
- Load balance requests across multiple instances.