Redis Distributed Lock Failures During Master-Slave Failover and Mitigation Strategies

Redis Lock Vulnearbility Analysis

Asynchronous replication creates critical vulnerabilities during failover scenarios:

Timeline:
1. Client A acquires lock on master (SET resource_id unique_val NX EX 30)
2. Lock replication to replica delayed (ms to hundreds of ms)
3. Master fails, sentinel triggers failover (3-10 seconds)
4. Replica becomes new master without Client A's lock
5. Client B acquires same lock → Lock violation

System Design Tradeoffs

  • Redis prioritizes AP (Availability + Partition Tolerance) over strong consistency
  • Asynchronous replication enables high performance but creates data loss windows
  • Failover duration = Replication delay + Failure detection + Election + Switchover

Redis Native Solutions

Multi-Instance Locking (RedLock Pattern)

class MultiNodeLock:
    def __init__(self, nodes):
        self.min_approvals = len(nodes) // 2 + 1
        self.nodes = nodes
    
    def acquire_lock(self, resource, ttl):
        start_time = time.monotonic()
        successes = 0
        
        for node in self.nodes:
            if node.set(resource, random_id(), nx=True, ex=ttl):
                successes += 1
        
        if successes >= self.min_approvals:
            elapsed = time.monotonic() - start_time
            return elapsed < ttl
        
        for node in self.nodes:
            node.delete(resource)
        return False

Infrastructure Tuning

# Cluster configuration optimizations
min-replicas-to-write 2
min-replicas-max-lag 10
cluster-node-timeout 5000

Alternative Coordination Systems

etcd Lock Implementation

func acquireLock(key string, ttl int) error {
    client, _ := clientv3.New(clientv3.Config{
        Endpoints: []string{"etcd-host:2379"},
    })
    
    lease := clientv3.NewLease(client)
    grant, _ := lease.Grant(context.TODO(), int64(ttl))
    
    resp, err := client.Txn(context.TODO()).
        If(clientv3.Compare(clientv3.CreateRevision(key), "=", 0)).
        Then(clientv3.OpPut(key, "locked", clientv3.WithLease(grant.ID))).
        Else().
        Commit()
    
    if err != nil || !resp.Succeeded {
        return errors.New("lock acquisition failed")
    }
    return nil
}

PostgreSQL Advisory Locks

BEGIN;
SELECT pg_advisory_xact_lock(123456);
-- Critical section
COMMIT;

Solution Comparison

Solution Consistency Performance Complexity
Redis Single Instance Weak High Low
Redis Multi-Instance Medium Medium High
etcd Strong Medium Medium
PostgreSQL Strong Medium Medium

Implementation Strategy

Evaluation Criteria

  • Lock failure tolerance thresholds
  • Existing infrastructure constraints
  • Performance baseline requirements

Hybrid Validation Approach

public class LockValidator {
    private LockService primary;
    private LockService secondary;
    
    public boolean secureLock(String resource) {
        boolean mainLock = primary.acquire(resource);
        boolean checkLock = secondary.acquire(resource);
        
        if (mainLock != checkLock) {
            logDiscrepancy(resource);
        }
        return mainLock;
    }
}

Monitoring Essentials

  • Lock acquisition duration: lock_obtain_duration_seconds
  • Lock contention frequency: lock_conflicts_per_minute
  • Replication latency: redis_replication_lag_ms
  • Failover events: redis_leader_changes_total

Tags: Redis distributed-locks etcd PostgreSQL Redisson

Posted on Mon, 25 May 2026 19:45:58 +0000 by 182x