While scaling stateless web servers horizontally is relatively easy (just spin up another EC2 instance behind a load balancer), scaling a stateful database horizontally is incredibly complex.
When your database becomes the bottleneck, there are two primary techniques to scale it out: Replication and Sharding.
Replication involves copying your data across multiple database servers to improve read throughput and provide fault tolerance.
In a traditional Master-Slave (or Primary-Replica) architecture:
WRITE operations (Insert, Update, Delete).Because 90% of traffic in most web applications is read traffic (users looking at tweets, not writing them), you can point all your SELECT queries to the fleet of Slave databases, massively offloading the CPU usage from your Master database.
Since the Master must copy the data over the network to the Slaves, it takes a few milliseconds. If a user posts a comment (written to Master) and immediately refreshes the page (read from Slave), the Slave might not have received the update yet, and the comment will appear to have vanished.
Replication solves the problem of too many reads. But what if you have too many writes? Or what if your database size exceeds the maximum hard drive size you can physically buy?
Sharding is the process of breaking up a massive database into smaller, distinct chunks (shards) and spreading those chunks across multiple independent database servers.
Imagine you have a Users table with 1 billion rows. You could shard the database based on the User's ID:
While sharding provides infinite write scalability and infinite storage capacity, it introduces nightmarish complexity into your application code:
JOIN queries across tables if the tables reside on completely different physical servers.