Zookeeper: Distributed Coordination Service for ELK Stack

ZooKeeper is an open-source distributed coordination service designed for high availability, performance, and consistency in distributed applications. It provides a fundamental service: distributed locking. Distributed applications can build more advanced services on top of it, such as synchronization services, configuration maintenance, cluster management, and naming services.

The ZooKeeper service itself forms a cluster. A cluster with 2n+1 (odd number) servers allows n failures. As long as more than half of the machines in the cluster are available, ZooKeeper remains operational. For example, in a 3-node cluster, one failure is tolerable. If two nodes fail, the cluster becomes unavailable (1 < 1.5). ZooKeeper clusters are typically deployed with an odd number of machines to enhance fault tolerance.

Working Mechanism

From a design pattern perspective, ZooKeeper is a distributed service management framework based on the observer pattern. It stores and manages data that all interested parties care about, then accepts observer registrations. When the state of this data changes, ZooKeeper is responsible for notifying the registered observers to react accordingly.

In essence, ZooKeeper = file system + notification mechanism.

Key Characteristics

  1. ZooKeeper consists of a leader and multiple followers in a cluster.
  2. As long as more than half of the nodes in the ZooKeeper cluster remain alive, the cluster can provide normal services. Therefore, ZooKeeper is suitable for deployment on an odd number of servers.
  3. Global data consistency: Each server maintains an identical data copy. Regardless of which server a client connects to, the data is consistent.
  4. Update requests are executed sequentially. Updates from the same client are processed in the order they were sent (FIFO).
  5. Data updates are atomic: An update either succeeds completely or fails entirely.
  6. Timeliness: Within a certain time frame, clients can read the latest data.

Data Structure

ZooKeeper's data model resembles the Linux file system structure. It can be viewed as a tree where each node is called a ZNode. Each ZNode can store up to 1MB of data by default and is uniquely identified by its path.

Application Scenarios

ZooKeeper provides services including: unified naming service, unified configuration management, unified cluster management, dynamic server node online/offline, and soft load balancing.

  • Unified Naming ServiceIn distributed environments, applications/services often require unified naming for easy identification. For example, IP addresses are difficult to remember, while domain names are more memorable.

  • Unified Configuration ManagementIn distributed environments, configuration file synchronization is common. Typically, all nodes in a cluster require consistent configuration information, such as in a Kafka cluster. After modifying a configuration file, rapid synchronization to all nodes is desired.

    Configuration management can be implemented using ZooKeeper. Configuration information can be written to a ZNode in ZooKeeper. Each client server monitors this ZNode. When the data in the ZNode is modified, ZooKeeper notifies all client servers.

  • Unified Cluster ManagementIn distributed environments, it's essential to monitor the real-time status of each node. Adjustments can be made based on node status changes.

    ZooKeeper can implement real-time monitoring of node status changes. Node information can be written to a ZNode in ZooKeeper. Monitoring this ZNode allows obtaining real-time status changes.

  • Dynamic Server Online/OfflineClients can detect server online/offline changes in real-time.

  • Soft Load BalancingZooKeeper records the access count for each server, directing new client requests to the server with the fewest accesses.

Leader Election Mechanism

Initial Startup Election

  1. Server 1 starts and initiates an election. Server 1 votes for itself. With only one vote (insufficient for a majority of 3), the election fails. Server 1 remains in LOOKING state.
  2. Server 2 starts and initiates another election. Servers 1 and 2 vote for themselves and exchange vote information. Server 1 discovers that Server 2's myid is larger than its current vote (Server 1) and changes its vote to Server 2. With 0 votes for Server 1 and 2 votes for Server 2 (still insufficient for a majority), the election fails. Both remain in LOOKING state.
  3. Server 3 starts and initiates an election. Servers 1 and 2 change their votes to Server 3. The result: Server 1 has 0 votes, Server 2 has 0 votes, and Server 3 has 3 votes. Server 3's votes exceed the majority, so it becomes the Leader. Servers 1 and 2 change to FOLLOWING state, while Server 3 changes to LEADING.
  4. Server 4 starts and initiates an election. Servers 1, 2, and 3 are no longer in LOOKING state and don't change their votes. The exchange results in Server 3 with 3 votes and Server 4 with 1 vote. Server 4 follows the majority, changes its vote to Server 3, and switches to FOLLOWING state.
  5. Server 5 starts similarly and becomes a follower.

Non-Initial Startup Election

When a ZooKeeper server encounters one of these situations, it enters the Leader election process:

  1. Server initialization startup.
  2. During operation, the server loses connection with the Leader.

When a machine enters the Leader election process, the cluster may be in one of these states:

  1. The cluster already has a Leader. The machine attempting to elect a Leader is informed of the current Leader's information. It only needs to connect to the Leader and synchronize its state.
  2. The cluster indeed has no Leader. Suppose a 5-node ZooKeeper cluster with SIDs 1, 2, 3, 4, 5 and ZXIDs 8, 8, 8, 7, 7, with SID 3 as the Leader. If servers 3 and 5 fail, Leader election begins.

Leader Election Rules:

  1. The server with a larger EPOCH wins.
  2. If EPOCH is the same, the server with a larger transaction ID (ZXID) wins.
  3. If ZXID is the same, the server with a larger server ID (SID) wins.
// Server identifiers
SID: Server ID. Uniquely identifies a machine in a ZooKeeper cluster. No duplicates allowed, matches myid.
ZXID: Transaction ID. Identifies a state change transaction. ZXID values may differ across cluster nodes depending on processing speed.
Epoch: Leader term identifier. During Leaderless periods, the logical clock value is the same for each voting round. This value increments after each vote.

Cluster Deployment

// Official download: https://archive.apache.org/dist/zookeeper/

Environment Preparation

// Disable firewall and SELinux
systemctl stop firewalld
setenforce 0
// Set hostnames
hostnamectl set-hostname zk-node01
bash

hostnamectl set-hostname zk-node02
bash

hostnamectl set-hostname zk-node03
bash
// Configure hosts file
echo "192.168.10.100 zk-node01" >> /etc/hosts
echo "192.168.10.101 zk-node02" >> /etc/hosts
echo "192.168.10.102 zk-node03" >> /etc/hosts
cat /etc/hosts

Install JDK

// Install JDK on all three nodes
yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
java -version

Install Zookeeper

// Copy Zookeeper package to all nodes
cd /opt/
scp apache-zookeeper-3.5.7-bin.tar.gz zk-node02:/opt/
scp apache-zookeeper-3.5.7-bin.tar.gz zk-node03:/opt/
// Extract and move to installation directory (all nodes)
tar xf apache-zookeeper-3.5.7-bin.tar.gz
mv apache-zookeeper-3.5.7-bin /usr/local/zookeeper-3.5.7

Configure Zookeeper

// Copy sample configuration
cd /usr/local/zookeeper-3.5.7/conf
cp zoo_sample.cfg zoo.cfg
// Edit configuration
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-3.5.7/data
dataLogDir=/usr/local/zookeeper-3.5.7/logs
clientPort=2181
server.1=192.168.10.100:3188:3288
server.2=192.168.10.101:3188:3288
server.3=192.168.10.102:3188:3288

Create Data and Log Directories

// Create directories on all nodes
mkdir /usr/local/zookeeper-3.5.7/data
mkdir /usr/local/zookeeper-3.5.7/logs

Create myid Files

// Create myid files with unique identifiers
echo 1 > /usr/local/zookeeper-3.5.7/data/myid
echo 2 > /usr/local/zookeeper-3.5.7/data/myid
echo 3 > /usr/local/zookeeper-3.5.7/data/myid

Configure Startup Script

// Create service script
vim /etc/init.d/zookeeper

#!/bin/bash
#chkconfig:2345 20 90
#description:Zookeeper Service Control Script
ZK_HOME='/usr/local/zookeeper-3.5.7'
case $1 in
start)
    echo "---------- zookeeper startup ------------"
    $ZK_HOME/bin/zkServer.sh start
;;
stop)
    echo "---------- zookeeper shutdown ------------"
    $ZK_HOME/bin/zkServer.sh stop
;;
restart)
    echo "---------- zookeeper restart ------------"
    $ZK_HOME/bin/zkServer.sh restart
;;
status)
    echo "---------- zookeeper status ------------"
    $ZK_HOME/bin/zkServer.sh status
;;
*)
    echo "Usage: $0 {start|stop|restart|status}"
esac
// Set permissions and enable service
chmod +x /etc/init.d/zookeeper
chkconfig --add zookeeper
service zookeeper start

Verify Cluster Status

// Check status on each node
service zookeeper status

Expected output should show one node as Leader and others as Followers.

Tags: ZooKeeper distributed-systems coordination-service elk-stack high-availability

Posted on Thu, 14 May 2026 21:21:03 +0000 by jcantrell