Load Balancers & Algorithms

A Load Balancer is a device or software that sits between the client and a group of backend servers. It distributes incoming network traffic across multiple servers to ensure no single server bears too much demand. This improves responsiveness, increases availability, and ensures reliability.

1. Where are Load Balancers Placed?

Load balancers are typically placed at three critical points in a system architecture:

Between the Client and the Web Server (Layer 7 LB)
Between the Web Server and the Application/API Server (Internal LB)
Between the Application Server and the Database Server (Database LB)

2. Types of Load Balancers

Hardware Load Balancers

Physical appliances (like F5 Networks or Citrix ADC) that sit in the data center. They are extremely fast and feature-rich, but incredibly expensive (can cost $100,000+).

Software Load Balancers

Software solutions like Nginx, HAProxy, or AWS Elastic Load Balancer (ELB). They are cost-effective, highly configurable, and run on commodity hardware.

Layer 4 vs Layer 7

Layer 4 (Transport Layer): Routes traffic based on IP address and TCP port. It cannot inspect the content of the request (like the URL path or HTTP headers). Extremely fast.
Layer 7 (Application Layer): Routes traffic based on the content of the HTTP request. It can make intelligent decisions like routing /api/* requests to backend servers and /images/* requests to a CDN. Slower but far more flexible.

3. Load Balancing Algorithms

Round Robin

Distributes requests sequentially. Request 1 goes to Server A, Request 2 goes to Server B, Request 3 goes to Server C, Request 4 goes back to Server A.

Pros: Dead simple.
Cons: Ignores server load. If Server A is already processing a massive request, it still gets the next one.

Weighted Round Robin

Same as Round Robin, but each server is assigned a weight based on its capacity. A powerful server with weight 5 gets 5x more requests than a server with weight 1.

Least Connections

Routes the new request to the server that currently has the fewest active connections.

Best for: Long-lived connections (WebSocket, database connections).

Least Response Time

Routes the request to the server that has the fewest active connections AND the lowest average response time. A sophisticated version of Least Connections.

IP Hash

The client's IP address is fed into a hash function. The result determines which server handles the request. This ensures the same client always hits the same server.

Best for: Session-based applications that store user state on the server (session affinity/sticky sessions).

Random

Routes the request to a randomly chosen server. Surprisingly effective when all servers are identical.

4. Health Checks

A load balancer continuously monitors the health of its backend servers by sending periodic Health Check requests (e.g., an HTTP GET to /health). If a server stops responding, the load balancer automatically removes it from the pool, ensuring that no traffic is sent to a dead server.