Rate Limiting is a technique used to control the rate of requests that clients can make to an API. It protects backend services from being overwhelmed by too many requests (whether from a DDoS attack, a misbehaving script, or a viral event).
A bucket holds a fixed number of tokens. Each request consumes one token. Tokens are refilled at a constant rate.
Requests enter a queue (the bucket). The queue is processed at a fixed, constant rate, regardless of how fast requests arrive. If the queue is full, new requests are dropped.
Time is divided into fixed windows (e.g., 1-minute intervals). A counter tracks the number of requests in the current window. If the counter exceeds the limit, requests are rejected until the next window.
Maintains a sorted set (log) of timestamps for each request. When a new request arrives, remove all timestamps older than the window duration. If the remaining count exceeds the limit, reject the request.
A hybrid approach that combines Fixed Window Counter and Sliding Window Log. It estimates the request count in the current sliding window by taking a weighted average of the current and previous fixed window counts.
express-rate-limit in Node.js).Typically, rate limit data (counters, timestamps) is stored in an in-memory store like Redis for sub-millisecond lookups.