codingstuff.io
ExploreTutorialsProblemsCS Subjects
Get Started
ExploreTutorialsProblemsCS Subjects
Get Started
codingstuff.io

Master the art of building software through interactive tutorials, real-world problems, and guided projects.

Pune, Maharashtra, India

codingstuffmail@gmail.com

Product

  • Explore
  • Tutorials
  • Problems
  • CS Subjects

Company

  • About
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • Sitemap

© 2026 codingstuff.io. All rights reserved.

Built with ❤️ for developers everywhere

/
/
All Subjects
🏗️

System Design

24 chapters

1System Design Basics2Vertical vs Horizontal Scaling3CAP Theorem4Load Balancers & Algorithms5Proxy Servers (Forward & Reverse)6Caching Strategies & Eviction7Content Delivery Networks (CDNs)8Database Replication9Database Sharding & Partitioning10Database Scaling & Sharding11Consistent Hashing12Choosing Databases (SQL vs NoSQL)13Message Queues (Kafka, RabbitMQ)14Microservices Architecture15API Gateways16Rate Limiting Algorithms17Long Polling vs WebSockets vs SSE18Heartbeat & Health Checks19Bloom Filters & Probabilistic Data Structures20Leader Election in Distributed Systems21Event-Driven Architecture22Distributed Locking23Circuit Breaker Pattern24Case Study: Design URL Shortener
SubjectsSystem Design

Bloom Filters & Probabilistic Data Structures

Updated 2026-04-26
3 min read

Bloom Filters & Probabilistic Data Structures

Imagine you run a URL shortener like bit.ly with 10 billion shortened URLs. When a user creates a new short URL, you need to check if it already exists. Querying a database of 10 billion records for every single creation request is prohibitively slow. A Bloom Filter can answer "Does this URL exist?" in constant time, using a tiny fraction of the memory a full database index would require.

1. What is a Bloom Filter?

A Bloom Filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It can tell you with certainty:

  • "Definitely NOT in the set" (no false negatives).
  • "Probably in the set" (possible false positives).

It NEVER produces false negatives. If the Bloom Filter says an element is not in the set, you can trust it 100%.

2. How It Works

  1. Initialize: Create a bit array of size $m$, all set to 0. Choose $k$ independent hash functions.
  2. Insert (element $x$): Compute $k$ hash values: $h_1(x), h_2(x), ..., h_k(x)$. Set the bits at positions $h_1(x) \mod m$, $h_2(x) \mod m$, ..., $h_k(x) \mod m$ to 1.
  3. Query (element $y$): Compute $k$ hash values for $y$. Check the bits at all $k$ positions.
    • If ANY bit is 0: The element is definitely NOT in the set.
    • If ALL bits are 1: The element is probably in the set. (Other elements might have set those same bits.)

3. Trade-offs

  • False Positive Rate: As more elements are added, more bits are set to 1, increasing the probability that a query for a non-existent element finds all 1s by coincidence. The false positive rate can be tuned by adjusting $m$ (bit array size) and $k$ (number of hash functions).
  • No Deletion: Standard Bloom Filters do not support deletion (setting a bit to 0 could affect other elements). Counting Bloom Filters replace each bit with a counter to support deletion.

4. Real-World Use Cases

  • Google Chrome: Uses a Bloom Filter to check if a URL is in its list of known malicious websites. If the Bloom Filter says "probably yes", Chrome then queries a remote server for confirmation.
  • Apache Cassandra: Uses Bloom Filters to avoid unnecessary disk reads. Before reading a file (SSTable) from disk, it checks the Bloom Filter. If the key is "definitely not" in that file, the disk read is skipped entirely.
  • Medium: Uses Bloom Filters to avoid recommending articles a user has already read.
  • Akamai CDN: Uses Bloom Filters to avoid caching one-hit-wonder content. An object is only cached after the second request.


PreviousHeartbeat & Health ChecksNextLeader Election in Distributed Systems

Recommended Gear

Heartbeat & Health ChecksLeader Election in Distributed Systems