codingstuff.io
ExploreTutorialsProblemsCS Subjects
Get Started
ExploreTutorialsProblemsCS Subjects
Get Started
codingstuff.io

Master the art of building software through interactive tutorials, real-world problems, and guided projects.

Pune, Maharashtra, India

codingstuffmail@gmail.com

Product

  • Explore
  • Tutorials
  • Problems
  • CS Subjects

Company

  • About
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • Sitemap

© 2026 codingstuff.io. All rights reserved.

Built with ❤️ for developers everywhere

/
/
All Subjects
🏗️

System Design

24 chapters

1System Design Basics2Vertical vs Horizontal Scaling3CAP Theorem4Load Balancers & Algorithms5Proxy Servers (Forward & Reverse)6Caching Strategies & Eviction7Content Delivery Networks (CDNs)8Database Replication9Database Sharding & Partitioning10Database Scaling & Sharding11Consistent Hashing12Choosing Databases (SQL vs NoSQL)13Message Queues (Kafka, RabbitMQ)14Microservices Architecture15API Gateways16Rate Limiting Algorithms17Long Polling vs WebSockets vs SSE18Heartbeat & Health Checks19Bloom Filters & Probabilistic Data Structures20Leader Election in Distributed Systems21Event-Driven Architecture22Distributed Locking23Circuit Breaker Pattern24Case Study: Design URL Shortener
SubjectsSystem Design

Rate Limiting Algorithms

Updated 2026-04-26
3 min read

Rate Limiting Algorithms

Rate Limiting is a technique used to control the rate of requests that clients can make to an API. It protects backend services from being overwhelmed by too many requests (whether from a DDoS attack, a misbehaving script, or a viral event).

1. Why Rate Limiting?

  • Prevent Abuse: A malicious user could send millions of requests per second to crash your servers.
  • Fair Usage: Ensure that one heavy user doesn't consume all resources, starving other users.
  • Cost Control: If your backend calls a paid third-party API (like an AI model), unlimited requests could generate a massive bill.
  • Stability: Protects the system from cascading failures during unexpected traffic spikes.

2. Common Algorithms

Token Bucket

A bucket holds a fixed number of tokens. Each request consumes one token. Tokens are refilled at a constant rate.

  • If the bucket is empty, the request is rejected (or queued).
  • Pros: Allows controlled bursts of traffic. Simple to implement.
  • Cons: Requires tracking the token count and last refill time per user.
  • Used by: AWS API Gateway, Stripe.

Leaky Bucket

Requests enter a queue (the bucket). The queue is processed at a fixed, constant rate, regardless of how fast requests arrive. If the queue is full, new requests are dropped.

  • Pros: Smooths out bursty traffic into a perfectly constant output rate.
  • Cons: Recent requests might be dropped in favor of older queued requests.

Fixed Window Counter

Time is divided into fixed windows (e.g., 1-minute intervals). A counter tracks the number of requests in the current window. If the counter exceeds the limit, requests are rejected until the next window.

  • Pros: Memory efficient (one counter per window).
  • Cons: Suffers from the boundary problem. A user can send 100 requests at 11:59:59 and another 100 at 12:00:01 (200 requests in 2 seconds) because they fall in different windows.

Sliding Window Log

Maintains a sorted set (log) of timestamps for each request. When a new request arrives, remove all timestamps older than the window duration. If the remaining count exceeds the limit, reject the request.

  • Pros: Perfectly accurate. No boundary problem.
  • Cons: Memory intensive (stores every request timestamp).

Sliding Window Counter

A hybrid approach that combines Fixed Window Counter and Sliding Window Log. It estimates the request count in the current sliding window by taking a weighted average of the current and previous fixed window counts.

  • Pros: Memory efficient and avoids the boundary problem.
  • Used by: Cloudflare.

3. Where to Implement Rate Limiting?

  • API Gateway: The most common location. Rate limits are enforced before requests even reach the backend services.
  • Application Level: Using middleware (like express-rate-limit in Node.js).
  • Service Mesh: Using sidecar proxies like Envoy or Istio.

Typically, rate limit data (counters, timestamps) is stored in an in-memory store like Redis for sub-millisecond lookups.



PreviousAPI GatewaysNextLong Polling vs WebSockets vs SSE

Recommended Gear

API GatewaysLong Polling vs WebSockets vs SSE