HAProxy Overview and Load Balancing Algorithms
HAProxy is a high-performance, open-source load balancer and proxy server designed to handle high traffic volumes while ensuring high availability. It operates effectively at Layer 7 (Application) and Layer 4 (Transport), allowing seamless integration into existing architectures to protect backend web servers from direct external attacks.
HAProxy supports several load distribution algorithms configured via the balance directive:
- roundrobin: Distributes requests in a circular order, the standard algorithm for most use cases.
- static-rr: Distributes based on static weights, avoiding server weight changes during runtime.
- leastconn: Routes traffic to the server with the fewest active connections, ideal for long-lived sessions.
- source: Hashes the client IP address to ensure a client consistently reaches the same backend server.
- uri: Hashes the left part of the URI (path) to ensure requests for the same URI go to the same server.
- url_param: Balances based on a specific URL parameter string found in the query string.
- hdr(name): Balances based on a specific HTTP header field.
- rdp-cookie(name): Balances based on the RDP cookie value for terminal server traffic.
Session Persistence Strategies
When requests are distributed across multiple backend servers, maintaining session state can be challenging. HAProxy offers three primary methods to handle session persistence:
-
IP-based Persistence: Uses the client's IP address to calculate a hash, directing the user to a fixed back end server. ``` balance source
-
Cookie Insertion: HAProxy inserts a backend-specific cookie into the HTTP response. ``` cookie SRV_ID insert indirect nocache
-
Application Session Tracking: HAProxy tracks session IDs generated by backend servers in an internal table. ``` appsession JSESSIONID len 64 timeout 5h request-learn
Automated Deployment with SaltStack
To deploy HAProxy efficiently across infrastructure, we can use SaltStack. The following demonstrates the state files required for installation on a target node (e.g., lb-server-01).
Dependency and Compilation State
First, define the installation dependencies and the build process. Create the directory structure and the SLS file.
# File: /srv/salt/prod/haproxy/init.sls
include:
- pkg.deps
haproxy-source:
file.managed:
- name: /usr/local/src/haproxy-2.4.0.tar.gz
- source: salt://haproxy/files/haproxy-2.4.0.tar.gz
- mode: 755
cmd.run:
- name: cd /usr/local/src && tar zxf haproxy-2.4.0.tar.gz && cd haproxy-2.4.0 && make TARGET=linux2628 PREFIX=/usr/local/haproxy && make install PREFIX=/usr/local/haproxy
- unless: test -d /usr/local/haproxy
- require:
- pkg: pkg-deps
- file: haproxy-source
Service Initialization
Next, manage the service script and system configuration.
haproxy-service-script:
file.managed:
- name: /etc/init.d/haproxy
- source: salt://haproxy/files/haproxy.init
- mode: 755
cmd.run:
- name: chkconfig --add haproxy
- unless: chkconfig --list | grep haproxy
- require:
- cmd: haproxy-source
# Enable binding to non-local IPs
net.ipv4.ip_nonlocal_bind:
sysctl.present:
- value: 1
haproxy-config-dir:
file.directory:
- name: /etc/haproxy
- mode: 755
Configuration and Back end Setup
The following configuration sets up a frontend listening on port 80 and balances traffic to two backend nodes. It also configures a statistics page on port 8080.
# File: /srv/salt/prod/cluster/haproxy-web.sls
include:
- haproxy.init
web-proxy-config:
file.managed:
- name: /etc/haproxy/haproxy.cfg
- source: salt://cluster/files/haproxy-web.cfg
- mode: 644
service.running:
- name: haproxy
- enable: True
- reload: True
- watch:
- file: web-proxy-config
The corresponding configuration file content:
global
maxconn 100000
chroot /usr/local/haproxy
uid 99
gid 99
daemon
nbproc 1
pidfile /usr/local/haproxy/logs/haproxy.pid
log 127.0.0.1 local0 info
defaults
option http-keep-alive
maxconn 100000
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
listen stats
mode http
bind 0.0.0.0:8080
stats enable
stats uri /admin?stats
stats auth admin:securepassword
frontend main_http
bind 10.0.0.5:80
mode http
option httplog
log global
default_backend app_servers
backend app_servers
option forwardfor header X-Forwarded-For
option httpchk GET /healthcheck HTTP/1.0
balance source
server app-node-01 10.0.0.10:80 check inter 2000 rise 2 fall 3
server app-node-02 10.0.0.11:80 check inter 2000 rise 2 fall 3
Logging Integration
To capture HAProxy logs, configure Rsyslog to receive UDP logs.
Edit /etc/rsyslog.conf to enable the UDP module and define the log file:
$ModLoad imudp
$UDPServerRun 514
# Save HAProxy logs to a specific file
local0.* /var/log/haproxy.log
Restart the logging service and HAProxy to apply changes:
systemctl restart rsyslog
/etc/init.d/haproxy restart
Access Control Lists (ACLs) and Content Routing
ACLs allow traffic routing based on specific criteria, such as the requested domain name or file type.
Virtual Host Routing
frontend multi_site_frontend
bind *:80
mode http
# Define ACL for specific domain
acl is_api_site hdr(host) -i api.example.com
# Route to specific backend if ACL matches
use_backend api_pool if is_api_site
default_backend main_website_pool
backend main_website_pool
balance roundrobin
server s1 10.0.0.10:80 check
backend api_pool
balance roundrobin
server s2 10.0.0.11:80 check
Static File Routing
You can route requests for static assets (images, CSS, JS) to a dedicated backend to optimize performance.
frontend static_content
# ACL for static file extensions
acl static_files url_end -i .jpg .png .css .js .gif
use_backend static_storage if static_files
default_backend app_servers
Runtime Management via Unix Socket
HAProxy provides a powerful Unix socket interface for dynamic administration without restarting the service.
Enable the socket in the global section of the configuration:
global
stats socket /var/run/haproxy/admin.sock mode 600 level admin
stats timeout 2m
Use the socat utility to interact with this socket.
# Install socat
yum install -y socat
# Check server status
echo "show info" | socat stdio /var/run/haproxy/admin.sock
# Disable a backend server for maintenance
echo "disable server api_pool/app-node-02" | socat stdio /var/run/haproxy/admin.sock
# Re-enable the server
echo "enable server api_pool/app-node-02" | socat stdio /var/run/haproxy/admin.sock
Performance Tuning: Handling TCP Port Exhaustion
In high-concurrency environments, HAProxy acting as a proxy may exhaust available ephemeral TCP ports. Here are strategies to mitigate this:
-
Expand Port Range: Increase the kernel's range of available ports. ``` sysctl -w net.ipv4.ip_local_port_range="1024 65535"
-
Enable Port Reuse: Allow connections in TIME_WAIT state to be reused for new connections. ``` sysctl -w net.ipv4.tcp_tw_reuse=1
-
Reduce TIME_WAIT Duration: Lower the timeout for closing connections. ``` sysctl -w net.ipv4.tcp_fin_timeout=15
-
Multi-IP Strategy: Bind HAProxy to multiple source IPs to multiply the available port pool (approx 65k ports per IP). ``` server db-master 10.0.0.50:3306 check source 10.0.0.100:1024-65535 server db-slave 10.0.0.51:3306 check source 10.0.0.101:1024-65535
HAProxy vs. Nginx
While both are powerful, they serve different strengths:
Nginx: Primarily a web server that also functions as a reverse proxy. It excels at serving static content and handling complex HTTP routing with location blocks. However, it lacks advanced load-balancing features and dynamic management capabilities out of the box.
HAProxy: Dedicated solely to load balancing. It offers superior performance metrics, a wider variety of load-balancing algorithms, a comprehensive statistics dashboard, and runtime control via sockets. It is generally preferred for pure proxy duties in high-traffic architectures.