Table of Contents
- Introduction to Replication
- Replication Configuration
- Establishing Replication
- Terminating Replication
- Topology Structures
- Replication Process
- Data Synchronization Internals
- Components Required for PSYNC
- PSYNC Command
- Full Synchronization
- Partial Synchronization
- Master-Replica Heartbeat
- Full Sync Triggers
- Common Configurations and Commands
Introduction to Replication
Redis replication is a mechanism that copies data from one Redis server to other servers. The source server is called the master node, while the receiving servers are called replica nodes (historically referred to as slave nodes).
Primary Functions of Replication
- Data Redundancy: Hot backup of data with multi-machine redundancy
- 故障恢复: When the master fails, replicas can serve requests—a form of functional redundancy
- 负载均衡: Master handles writes while replicas handle reads, distributing load across multiple instances
- High Availability Foundation: Replication serves as the foundation for Sentinel and Cluster implementations
By default, every Redis server starts as a master node. Each master can have multiple replicas, but each replica can only have one master.
Replication Configuration
Establishing Replication
Configuration Methods
There are three ways to configure replication:
- Add
replicaofdirective in the configuration file - Use the
--replicaofflag when starting redis-server - Execute the
REPLICAOFcommand directly in the Redis client
Practical Demonstration
Preparing the Nodes
Configure port 6380 as the master node and port 6381 as the replica node. Create a new configuration file named redis-6381.conf and modify the port setting accordingly.
Start both Redis instances using the configuration files.
Executing the Replication Command
Connect to the replica instance (port 6381) and execute:
redis-cli -p 6381
REPLICAOF 127.0.0.1 6380
Verification
Write data to the master node:
redis-cli -p 6380 SET message "Hello from master"
Read from the replica:
redis-cli -p 6381 GET message
The replica successfully retrieves the data, confirming replication is active.
Terminating Replication
Direct Termination
Use the REPLICAOF NO ONE command to sever the replication relationship:
redis-cli -p 6381 REPLICAOF NO ONE
Post-termination behavior:
- Previously replicated data remains on the replica
- Subsequent writes to the master are not synchronized
Switching to Another Master
To switch to a different master, use the REPLICAOF command with the new master address. Unlike REPLICAOF NO ONE, this clears all data previously replicated from the original master.
Topology Structures
Redis supports three replication topologies:
- One-to-One: One master with one replica
- One-to-Many: One master with multiple replicas
- Tree Structure: Intermediate replicas can serve as masters for other replicas
Replication Process
Phase 1: Storing Master Information
When the REPLICAOF command is executed, the replica only saves the master's address information and returns immediately. The actual replication has not started yet.
Phase 2: Establishing Socket Connection
The replica runs a periodic task (every second) that checks for new master configurations. When a new master is detected, it attempts to establish a network connection by creating a socket. All subsequent data synchronization occurs through this socket.
If the connection fails, the periodic task continues retrying until succesfull or until replication is cancelled.
Phase 3: Sanding PING
After the socket connection is established, the replica sends a PING request to the master. The purposes include:
- Verifying the socket is functional
- Checking if the master can process commands
If the replica doesn't receive a PONG response or the request times out (due to network issues or master being blocked), the replica disconnects and the periodic task attempts reconnection.
Phase 4: Authentication
If the master has requirepass configured, the replica must have masterauth set to the same password. Authentication failure causes the replica to disconnect and retry on the next periodic task execution.
Phase 5: Data Synchronization
After the replication connection is established, data synchronization begins—the initialization phase where the master sends all its data to the replica. The mechanism uses the PSYNC command (replaced SYNC in versions prior to 2.8). This is the most time-consuming phase, involving either full or partial synchronization.
Phase 6: Continuous Command Propagation
Once the initial data transfer completes, the master continuously sends write commands to the replica, maintaining data consistency.
Data Synchronization Internals
When the replica establishes a connection with the master, it sends a PSYNC command to synchronize data. There are two synchronization modes:
- Full Synchronization: Used for initial replication—the master sends all data at once
- Partial Synchronization: Handles cases where network interruptions cause data loss. When reconnecting, if the master still has the missing data, it sends only the lost portion instead of resending everything
Components Required for PSYNC
The PSYNC command requires three components:
Master-Replica Replication Offset
The master tracks the cumulative byte length of processed write commands. The replica also accumulates offsets when receiving commands from the master. By comparing these offsets, you can determine the data difference between master and replica.
Master Replication Backlog Buffer
The replication backlog is a fixed-length queue stored on the master node with a default size of 1MB. When a replica connects, the master writes commands to this buffer in addition to sending them to replicas.
This is a FIFO queue—older data gets overwritten when capacity is exceeded. The size is configurable and critical for partial replication. View details using INFO replication:
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:4505
repl_backlog_histlen:5460
Master Run ID
Every Redis node generates a unique 40-character hexadecimal string as its run ID upon startup. Replicas store the master's run ID to identify which master they are replicating from. The run ID changes after a restart.
During initial replication, the replica saves the master's run ID.
PSYNC Command
The replica sends PSYNC to the master for partial or full synchronization:
PSYNC {runid} {offset}
runid: The master's run IDoffset: The replica's current data offset
For initial replication (no runid or offset available), the command PSYNC ? -1 is sent.
Full Synchronization Flow
- Initial replication sends
PSYNC ? -1 - Master recognizes this as full sync and responds with
FULLRESYNC - Replica saves the master's run ID and offset from the response
- Master executes
BGSAVEand simultaneously writes commands to the replication buffer - After
BGSAVEcompletes, master sends the generated RDB file to the replica - Replica clears its existing data upon receiving the RDB
- Replica loads the RDB file, updating its state to match the master's state at the time of
BGSAVE - Master sends commands from the replication buffer to the replica
- Replica executes these commands, reaching the master's current state
- If AOF is enabled on the replica,
BGREWRITEAOFis triggered immediately to ensure the AOF file is ready
Partial Synchronization Flow
Partial replication optimizes for the high cost of full synchronization using PSYNC {runid} {offset}.
When network interruption or command loss occurs during replication, the replica requests the master to resend missing data. If the master's replication backlog contains this data, it sends only the missing portion, avoiding a full resynchronization.
Flow:
- Network interruption occurs—if it exceeds
repl-timeout, master marks the replica as failed and terminates the connection - Master continues accepting writes; new commands don't reach the replica (inconsistency occurs). The master writes to the replication backlog buffer
- Network recovers; replica reconnects to master
- Replica has saved the master's run ID and its replication offset, sending
PSYNC {runid} {offset} - Master checks if partial replication conditions are met
- If conditions are satisfied, master responds with
CONTINUE - Master sends the missing data from the backlog buffer to the replica
Conditions for Partial Replication:
- The runid must match the master's current runid
- The offset requested by the replica must fall within the backlog buffer's range
If conditions aren't met, master returns FULLRESYNC, triggering full synchronization.
Backlog Buffer Size Consideration:
If the buffer is too small, it gets overwritten, preventing partial replication after network recovery. Size should be calculated based on network interruption duration, command size, and master QPS.
Master-Replica Heartbeat
Heartbeat Mechanism
- Master periodically pings replicas—controlled by
repl-ping-replica-period(default 10 seconds) - Replica sends
REPLCONF ACK {offset}every second:- Monitors connection health in real-time
- Reports its replication offset; if data is missing, master pulls from backlog buffer
- Enables replica count and latency tracking via
min-replicas-to-writeandmin-replicas-max-lag:- If enabled, master rejects writes if available replicas are fewer than
min-replicas-to-writeor latency exceedsmin-replicas-max-lag
- If enabled, master rejects writes if available replicas are fewer than
repl-timeout Parameter
The repl-timeout parameter (default 60 seconds) handles:
- Replica doesn't receive RDB snapshot within timeout
- Replica doesn't receive data packets or PING from master within timeout
- Master doesn't receive REPLCONF ACK within timeout
When timeout occurs, the connection closes and the replica attempts reconnection. This value should exceed repl-ping-replica-period.
Recommendation: Deploy master and replicas in the same datacenter to minimize latency.
Full Synchronizatoin Triggers
Full synchronization is resource-intensive and should be avoided. Common triggers include:
- Initial replication: Unavoidable—schedule during low-traffic periods
- Run ID mismatch: Replica stores the master's run ID; if master restarts, the run ID changes, triggering full sync. Avoid restarts using
DEBUG RELOADor implement failover (promote replica to master, use Sentinel or Cluster) - Insufficient replication backlog: Default 1MB; overflow causes full sync when reconnecting. Calculate size based on network conditions, command size, and QPS
Common Configurations and Commands
replica-read-only yes: Replicas are read-only by default; writes cause data inconsistencyrepl-disable-tcp-nodelay: Default isno; setting toyesis recommended. This controls the TCP Nagle algorithmDEBUG RELOAD: Reloads data from RDB without changing run ID but clears memory firstINFO replication: Displays replication status and metrics