Redis Replication: Configuration, Topology, and Synchronization Mechanisms

Table of Contents

  • Introduction to Replication
  • Replication Configuration
    • Establishing Replication
    • Terminating Replication
  • Topology Structures
  • Replication Process
  • Data Synchronization Internals
    • Components Required for PSYNC
    • PSYNC Command
    • Full Synchronization
    • Partial Synchronization
  • Master-Replica Heartbeat
  • Full Sync Triggers
  • Common Configurations and Commands

Introduction to Replication

Redis replication is a mechanism that copies data from one Redis server to other servers. The source server is called the master node, while the receiving servers are called replica nodes (historically referred to as slave nodes).

Primary Functions of Replication

  1. Data Redundancy: Hot backup of data with multi-machine redundancy
  2. 故障恢复: When the master fails, replicas can serve requests—a form of functional redundancy
  3. 负载均衡: Master handles writes while replicas handle reads, distributing load across multiple instances
  4. High Availability Foundation: Replication serves as the foundation for Sentinel and Cluster implementations

By default, every Redis server starts as a master node. Each master can have multiple replicas, but each replica can only have one master.


Replication Configuration

Establishing Replication

Configuration Methods

There are three ways to configure replication:

  1. Add replicaof directive in the configuration file
  2. Use the --replicaof flag when starting redis-server
  3. Execute the REPLICAOF command directly in the Redis client

Practical Demonstration

Preparing the Nodes

Configure port 6380 as the master node and port 6381 as the replica node. Create a new configuration file named redis-6381.conf and modify the port setting accordingly.

Start both Redis instances using the configuration files.

Executing the Replication Command

Connect to the replica instance (port 6381) and execute:

redis-cli -p 6381
REPLICAOF 127.0.0.1 6380

Verification

Write data to the master node:

redis-cli -p 6380 SET message "Hello from master"

Read from the replica:

redis-cli -p 6381 GET message

The replica successfully retrieves the data, confirming replication is active.

Terminating Replication

Direct Termination

Use the REPLICAOF NO ONE command to sever the replication relationship:

redis-cli -p 6381 REPLICAOF NO ONE

Post-termination behavior:

  • Previously replicated data remains on the replica
  • Subsequent writes to the master are not synchronized

Switching to Another Master

To switch to a different master, use the REPLICAOF command with the new master address. Unlike REPLICAOF NO ONE, this clears all data previously replicated from the original master.


Topology Structures

Redis supports three replication topologies:

  1. One-to-One: One master with one replica
  2. One-to-Many: One master with multiple replicas
  3. Tree Structure: Intermediate replicas can serve as masters for other replicas

Replication Process

Phase 1: Storing Master Information

When the REPLICAOF command is executed, the replica only saves the master's address information and returns immediately. The actual replication has not started yet.

Phase 2: Establishing Socket Connection

The replica runs a periodic task (every second) that checks for new master configurations. When a new master is detected, it attempts to establish a network connection by creating a socket. All subsequent data synchronization occurs through this socket.

If the connection fails, the periodic task continues retrying until succesfull or until replication is cancelled.

Phase 3: Sanding PING

After the socket connection is established, the replica sends a PING request to the master. The purposes include:

  1. Verifying the socket is functional
  2. Checking if the master can process commands

If the replica doesn't receive a PONG response or the request times out (due to network issues or master being blocked), the replica disconnects and the periodic task attempts reconnection.

Phase 4: Authentication

If the master has requirepass configured, the replica must have masterauth set to the same password. Authentication failure causes the replica to disconnect and retry on the next periodic task execution.

Phase 5: Data Synchronization

After the replication connection is established, data synchronization begins—the initialization phase where the master sends all its data to the replica. The mechanism uses the PSYNC command (replaced SYNC in versions prior to 2.8). This is the most time-consuming phase, involving either full or partial synchronization.

Phase 6: Continuous Command Propagation

Once the initial data transfer completes, the master continuously sends write commands to the replica, maintaining data consistency.


Data Synchronization Internals

When the replica establishes a connection with the master, it sends a PSYNC command to synchronize data. There are two synchronization modes:

  1. Full Synchronization: Used for initial replication—the master sends all data at once
  2. Partial Synchronization: Handles cases where network interruptions cause data loss. When reconnecting, if the master still has the missing data, it sends only the lost portion instead of resending everything

Components Required for PSYNC

The PSYNC command requires three components:

Master-Replica Replication Offset

The master tracks the cumulative byte length of processed write commands. The replica also accumulates offsets when receiving commands from the master. By comparing these offsets, you can determine the data difference between master and replica.

Master Replication Backlog Buffer

The replication backlog is a fixed-length queue stored on the master node with a default size of 1MB. When a replica connects, the master writes commands to this buffer in addition to sending them to replicas.

This is a FIFO queue—older data gets overwritten when capacity is exceeded. The size is configurable and critical for partial replication. View details using INFO replication:

repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:4505
repl_backlog_histlen:5460

Master Run ID

Every Redis node generates a unique 40-character hexadecimal string as its run ID upon startup. Replicas store the master's run ID to identify which master they are replicating from. The run ID changes after a restart.

During initial replication, the replica saves the master's run ID.

PSYNC Command

The replica sends PSYNC to the master for partial or full synchronization:

PSYNC {runid} {offset}
  • runid: The master's run ID
  • offset: The replica's current data offset

For initial replication (no runid or offset available), the command PSYNC ? -1 is sent.

Full Synchronization Flow

  1. Initial replication sends PSYNC ? -1
  2. Master recognizes this as full sync and responds with FULLRESYNC
  3. Replica saves the master's run ID and offset from the response
  4. Master executes BGSAVE and simultaneously writes commands to the replication buffer
  5. After BGSAVE completes, master sends the generated RDB file to the replica
  6. Replica clears its existing data upon receiving the RDB
  7. Replica loads the RDB file, updating its state to match the master's state at the time of BGSAVE
  8. Master sends commands from the replication buffer to the replica
  9. Replica executes these commands, reaching the master's current state
  10. If AOF is enabled on the replica, BGREWRITEAOF is triggered immediately to ensure the AOF file is ready

Partial Synchronization Flow

Partial replication optimizes for the high cost of full synchronization using PSYNC {runid} {offset}.

When network interruption or command loss occurs during replication, the replica requests the master to resend missing data. If the master's replication backlog contains this data, it sends only the missing portion, avoiding a full resynchronization.

Flow:

  1. Network interruption occurs—if it exceeds repl-timeout, master marks the replica as failed and terminates the connection
  2. Master continues accepting writes; new commands don't reach the replica (inconsistency occurs). The master writes to the replication backlog buffer
  3. Network recovers; replica reconnects to master
  4. Replica has saved the master's run ID and its replication offset, sending PSYNC {runid} {offset}
  5. Master checks if partial replication conditions are met
  6. If conditions are satisfied, master responds with CONTINUE
  7. Master sends the missing data from the backlog buffer to the replica

Conditions for Partial Replication:

  1. The runid must match the master's current runid
  2. The offset requested by the replica must fall within the backlog buffer's range

If conditions aren't met, master returns FULLRESYNC, triggering full synchronization.

Backlog Buffer Size Consideration:

If the buffer is too small, it gets overwritten, preventing partial replication after network recovery. Size should be calculated based on network interruption duration, command size, and master QPS.


Master-Replica Heartbeat

Heartbeat Mechanism

  1. Master periodically pings replicas—controlled by repl-ping-replica-period (default 10 seconds)
  2. Replica sends REPLCONF ACK {offset} every second:
    • Monitors connection health in real-time
    • Reports its replication offset; if data is missing, master pulls from backlog buffer
    • Enables replica count and latency tracking via min-replicas-to-write and min-replicas-max-lag:
      • If enabled, master rejects writes if available replicas are fewer than min-replicas-to-write or latency exceeds min-replicas-max-lag

repl-timeout Parameter

The repl-timeout parameter (default 60 seconds) handles:

  • Replica doesn't receive RDB snapshot within timeout
  • Replica doesn't receive data packets or PING from master within timeout
  • Master doesn't receive REPLCONF ACK within timeout

When timeout occurs, the connection closes and the replica attempts reconnection. This value should exceed repl-ping-replica-period.

Recommendation: Deploy master and replicas in the same datacenter to minimize latency.


Full Synchronizatoin Triggers

Full synchronization is resource-intensive and should be avoided. Common triggers include:

  1. Initial replication: Unavoidable—schedule during low-traffic periods
  2. Run ID mismatch: Replica stores the master's run ID; if master restarts, the run ID changes, triggering full sync. Avoid restarts using DEBUG RELOAD or implement failover (promote replica to master, use Sentinel or Cluster)
  3. Insufficient replication backlog: Default 1MB; overflow causes full sync when reconnecting. Calculate size based on network conditions, command size, and QPS

Common Configurations and Commands

  1. replica-read-only yes: Replicas are read-only by default; writes cause data inconsistency
  2. repl-disable-tcp-nodelay: Default is no; setting to yes is recommended. This controls the TCP Nagle algorithm
  3. DEBUG RELOAD: Reloads data from RDB without changing run ID but clears memory first
  4. INFO replication: Displays replication status and metrics

Tags: Redis Replication database High Availability Distributed Systems

Posted on Sat, 09 May 2026 03:10:03 +0000 by robin339