Rate Limiting Algorithms

Rate Limiting is a technique used to control the rate of requests that clients can make to an API. It protects backend services from being overwhelmed by too many requests (whether from a DDoS attack, a misbehaving script, or a viral event).

1. Why Rate Limiting?

Prevent Abuse: A malicious user could send millions of requests per second to crash your servers.
Fair Usage: Ensure that one heavy user doesn't consume all resources, starving other users.
Cost Control: If your backend calls a paid third-party API (like an AI model), unlimited requests could generate a massive bill.
Stability: Protects the system from cascading failures during unexpected traffic spikes.

2. Common Algorithms

Token Bucket

A bucket holds a fixed number of tokens. Each request consumes one token. Tokens are refilled at a constant rate.

If the bucket is empty, the request is rejected (or queued).
Pros: Allows controlled bursts of traffic. Simple to implement.
Cons: Requires tracking the token count and last refill time per user.
Used by: AWS API Gateway, Stripe.

Leaky Bucket

Requests enter a queue (the bucket). The queue is processed at a fixed, constant rate, regardless of how fast requests arrive. If the queue is full, new requests are dropped.

Pros: Smooths out bursty traffic into a perfectly constant output rate.
Cons: Recent requests might be dropped in favor of older queued requests.

Fixed Window Counter

Time is divided into fixed windows (e.g., 1-minute intervals). A counter tracks the number of requests in the current window. If the counter exceeds the limit, requests are rejected until the next window.

Pros: Memory efficient (one counter per window).
Cons: Suffers from the boundary problem. A user can send 100 requests at 11:59:59 and another 100 at 12:00:01 (200 requests in 2 seconds) because they fall in different windows.

Sliding Window Log

Maintains a sorted set (log) of timestamps for each request. When a new request arrives, remove all timestamps older than the window duration. If the remaining count exceeds the limit, reject the request.

Pros: Perfectly accurate. No boundary problem.
Cons: Memory intensive (stores every request timestamp).

Sliding Window Counter

A hybrid approach that combines Fixed Window Counter and Sliding Window Log. It estimates the request count in the current sliding window by taking a weighted average of the current and previous fixed window counts.

Pros: Memory efficient and avoids the boundary problem.
Used by: Cloudflare.

3. Where to Implement Rate Limiting?

API Gateway: The most common location. Rate limits are enforced before requests even reach the backend services.
Application Level: Using middleware (like express-rate-limit in Node.js).
Service Mesh: Using sidecar proxies like Envoy or Istio.

Typically, rate limit data (counters, timestamps) is stored in an in-memory store like Redis for sub-millisecond lookups.