Load Balancing: Scaling the Web

A Load Balancer is the traffic cop of the internet. It sits in front of a cluster of servers and ensures that no single machine is overwhelmed by too many requests. This is how sites like Google or Netflix stay online even when millions of people visit at once.

💻
Users
⚖️
Balancer
🖥️ Server A
🖥️ Server B
🖥️ Server C

The Balancer receives the request and dynamically routes it to an available, healthy server.

Balancing Strategies

Modern load balancers aren't just random; they use specific logic to decide where a user goes:

Round Robin

The simplest method. It goes down the list (1, 2, 3...) and starts over. It’s fair, but doesn't account for server "exhaustion."

Least Connections

The "Efficiency" mode. It checks which server is currently handling the fewest people and sends the new request there.

Session Persistence

Often called "Sticky Sessions." It ensures that if you are halfway through a shopping checkout on Server A, you aren't suddenly moved to Server B, which might not know who you are.

L4 vs. L7: How Deep Do We Look?

Level Protocol Intelligence Pros
Layer 4 TCP/UDP Low (IP & Port only) Extremely fast, low CPU usage.
Layer 7 HTTP/HTTPS High (Cookies, URLs) Smart routing (e.g., video traffic to video servers).

The "Health Check" Mechanism

A Load Balancer is also a guardian. It constantly "pings" its servers. If a server stops responding (crashes), the Balancer automatically removes it from the rotation. The user never knows anything went wrong—they are simply routed to a healthy survivor.

Pro Tip: Most modern "Cloud" setups (like Cloudflare or AWS) combine the Load Balancer with a Web Application Firewall (WAF). This means the same machine that balances your traffic also checks it for hackers and bots.