codingstuff.io
ExploreTutorialsProblemsCS Subjects
Get Started
ExploreTutorialsProblemsCS Subjects
Get Started
codingstuff.io

Master the art of building software through interactive tutorials, real-world problems, and guided projects.

Pune, Maharashtra, India

codingstuffmail@gmail.com

Product

  • Explore
  • Tutorials
  • Problems
  • CS Subjects

Company

  • About
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • Sitemap

© 2026 codingstuff.io. All rights reserved.

Built with ❤️ for developers everywhere

/
/
All Subjects
🏗️

System Design

24 chapters

1System Design Basics2Vertical vs Horizontal Scaling3CAP Theorem4Load Balancers & Algorithms5Proxy Servers (Forward & Reverse)6Caching Strategies & Eviction7Content Delivery Networks (CDNs)8Database Replication9Database Sharding & Partitioning10Database Scaling & Sharding11Consistent Hashing12Choosing Databases (SQL vs NoSQL)13Message Queues (Kafka, RabbitMQ)14Microservices Architecture15API Gateways16Rate Limiting Algorithms17Long Polling vs WebSockets vs SSE18Heartbeat & Health Checks19Bloom Filters & Probabilistic Data Structures20Leader Election in Distributed Systems21Event-Driven Architecture22Distributed Locking23Circuit Breaker Pattern24Case Study: Design URL Shortener
SubjectsSystem Design

Database Scaling & Sharding

Updated 2026-05-04
3 min read

Database Scaling & Sharding

While scaling stateless web servers horizontally is relatively easy (just spin up another EC2 instance behind a load balancer), scaling a stateful database horizontally is incredibly complex.

When your database becomes the bottleneck, there are two primary techniques to scale it out: Replication and Sharding.

1. Database Replication (Master-Slave)

Replication involves copying your data across multiple database servers to improve read throughput and provide fault tolerance.

In a traditional Master-Slave (or Primary-Replica) architecture:

  • There is exactly one Master database. It is the only database permitted to handle WRITE operations (Insert, Update, Delete).
  • There are multiple Slave databases. They are read-only.
  • Whenever data is written to the Master, it automatically syncs those changes down to all the Slaves.

The Benefits

Because 90% of traffic in most web applications is read traffic (users looking at tweets, not writing them), you can point all your SELECT queries to the fleet of Slave databases, massively offloading the CPU usage from your Master database.

The Drawback: Replication Lag

Since the Master must copy the data over the network to the Slaves, it takes a few milliseconds. If a user posts a comment (written to Master) and immediately refreshes the page (read from Slave), the Slave might not have received the update yet, and the comment will appear to have vanished.

2. Database Sharding (Data Partitioning)

Replication solves the problem of too many reads. But what if you have too many writes? Or what if your database size exceeds the maximum hard drive size you can physically buy?

Sharding is the process of breaking up a massive database into smaller, distinct chunks (shards) and spreading those chunks across multiple independent database servers.

Imagine you have a Users table with 1 billion rows. You could shard the database based on the User's ID:

  • Shard 1 (Server A): Stores User IDs 1 to 250,000,000.
  • Shard 2 (Server B): Stores User IDs 250,000,001 to 500,000,000.
  • Shard 3 (Server C): Stores User IDs 500,000,001 to 750,000,000.

The Challenges of Sharding

While sharding provides infinite write scalability and infinite storage capacity, it introduces nightmarish complexity into your application code:

  1. Join Operations: You can no longer easily execute SQL JOIN queries across tables if the tables reside on completely different physical servers.
  2. Resharding Data: If Shard 1 fills up completely because users 1 to 250M uploaded way more photos than everyone else, you have to rebalance the data across the shards, which requires massive downtime and data migration scripts.
  3. Celebrity Problem: If you shard a social network by User ID, and Justin Bieber (User ID 50) posts a photo, millions of fans will query Shard 1 simultaneously, bringing that single server to its knees while the other shards sit idle. This is known as a "Hotspot".


PreviousDatabase Sharding & PartitioningNextConsistent Hashing

Recommended Gear

Database Sharding & PartitioningConsistent Hashing