Scalability

Scalability is the system’s ability to handle increasing load — users, requests, data, or traffic — by upgrading resources or infrastructure without degrading performance.

🔍 Two Key Types of Scalability

Type	Description	Example
Vertical Scaling (Scale Up)	Add more power (CPU, RAM, etc.) to a single machine.	Upgrading from a 4-core to a 16-core server.
Horizontal Scaling (Scale Out)	Add more machines to handle the load in parallel.	Adding more servers behind a load balancer.

🧠 Why It Matters

A non-scalable system fails or slows down as demand grows.
A scalable system maintains or improves performance with growth.

⚙️ What Needs to Scale?

Traffic (requests/sec) → Need load balancers, CDN.
Data size → Need sharding, distributed databases.
Users → Need session management, stateless services.
Compute workload → Need worker queues, autoscaling.

🚀 How to Design for Scalability

Stateless services → Easier to scale horizontally.
Database sharding/partitioning → Avoid single-node bottlenecks.
Caching → Reduce database load (Redis, Memcached).
Async processing → Use queues (e.g., RabbitMQ, Kafka) for heavy tasks.
CDNs → Offload static content from your servers.
Auto-scaling infrastructure → Dynamically adjust based on load (e.g., AWS EC2 Auto Scaling).

📉 Scalability Trade-offs

Cost → Scaling can be expensive.
Complexity → More moving parts, harder debugging.
Consistency → More scaling often means sacrificing strong consistency (CAP theorem).

🧪 Example

If your app handles:

100 users/day with 1 server – fine.
10,000 users/day – that one server might choke.

A scalable design will let you:

Add more servers automatically (horizontal scaling).
Cache responses (Redis).
Load balance requests across multiple instances.