A Load Balancer is a device or software that sits between the client and a group of backend servers. It distributes incoming network traffic across multiple servers to ensure no single server bears too much demand. This improves responsiveness, increases availability, and ensures reliability.
Load balancers are typically placed at three critical points in a system architecture:
Physical appliances (like F5 Networks or Citrix ADC) that sit in the data center. They are extremely fast and feature-rich, but incredibly expensive (can cost $100,000+).
Software solutions like Nginx, HAProxy, or AWS Elastic Load Balancer (ELB). They are cost-effective, highly configurable, and run on commodity hardware.
/api/* requests to backend servers and /images/* requests to a CDN. Slower but far more flexible.Distributes requests sequentially. Request 1 goes to Server A, Request 2 goes to Server B, Request 3 goes to Server C, Request 4 goes back to Server A.
Same as Round Robin, but each server is assigned a weight based on its capacity. A powerful server with weight 5 gets 5x more requests than a server with weight 1.
Routes the new request to the server that currently has the fewest active connections.
Routes the request to the server that has the fewest active connections AND the lowest average response time. A sophisticated version of Least Connections.
The client's IP address is fed into a hash function. The result determines which server handles the request. This ensures the same client always hits the same server.
Routes the request to a randomly chosen server. Surprisingly effective when all servers are identical.
A load balancer continuously monitors the health of its backend servers by sending periodic Health Check requests (e.g., an HTTP GET to /health). If a server stops responding, the load balancer automatically removes it from the pool, ensuring that no traffic is sent to a dead server.