Elasticsearch Distributed Architecture: Sharding, Routing, and Split-Brain Mitigation

Distributed Systems vs. Clustering

Architecture Type	Analogy	Primary Benefit
Cluster	Multiple workers performing identical tasks	System high availability and load distribution
Distributed	Multiple workers performing distinct specialized tasks	Compute/storage scaling, decoupling, performance acceleration

Elasticsearch inherently abstracts the complexities of distributed systems, offering native clustering capabilities.

Core Cluster Terminology

Cluster: A collection of nodes united under a shared cluster name.
Node: A single running instance of Elasticsearch within a cluster.
Index: The logical namespace for data storage, analogous to a database in relational systems.
Shard: A subdivided segment of an index. Shards can be distributed across various nodes in a cluster.
Primary Shard: The original data partition from which replicas are derived.
Replica Shard: A redundant copy of a primary shard, ensuring fault tolerance and read throughput.

Kibana Cluster Configuration

When monitoring a multi-node cluster, Kibana must be configured to point to all available Elasticsearch instances.

# Configuration: kibana-cluster.yml
server.port: 6601
server.host: "0.0.0.0"
server.name: "production-cluster-dashboard"
i18n.locale: "en"
elasticsearch.hosts: ["http://host1:9200", "http://host2:9200", "http://host3:9200"]
elasticsearch.requestTimeout: 120000

Shard Allocation

By default, an index is created with one primary shard and one replica. Custom shard configurations are defined within the settings block during index creation.

PUT app_data_store
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 2
  },
  "mappings": {
    "properties": {
      "user_alias": {
        "type": "text"
      }
    }
  }
}

The above configuration creates 5 primary shards and 10 replica shards, totaling 15 shards.

Auto-Rebalancing and Immutability

If a node fails, its assigned shards are automatically rebalanced to the remaining active nodes. It is critical to note that the number of primary shards is immutable once an index is created.

Shard Sizing Best Practices

Keep individual shard sizes between 10GB and 30GB.
Target shard count formula: Node Count * 1 to 3.

Example Calculation: For 2000GB of data, aiming for 20GB per shard requires 100 shards. Using the target formula (dividing by 2), the recommended cluster size would be 50 nodes.

Document Routing Mechanism

The process of determining which shard stores a specific document is known as routing. Elasticsearch uses a deterministic hashing algorithm to ensure the same document ID always maps to the same primary shard.

Formula: shard_num = hash(document_id) % number_of_primary_shards

Split-Brain Phenomenon

In a healthy Elasticsearch cluster, a single Master node orchestrates cluster-wide operations—such as index creation, node membership tracking, and shard allocation. All nodes unanimously elect and acknowledge this single master.

Split-brain occurs when network partitions or node failures cause a divergence in master election, resulting in multiple independent master nodes governing disjointed cluster fragments.

Common Causes of Split-Brain

Network Latency: Delayed heartbeat responses between nodes, particularly in cross-region or cloud deployments.
Node Overload: When a node serves dual roles (Master and Data), heavy data operations can exhaust resources, causing the master process to become unresponsive (false failure).
JVM Garbage Collection Pauses: Insufficient heap allocation triggers prolonged stop-the-world GC events, leading to temporary node unresponsiveness.

Mitigation Strategies

Timeout Adjustment: Increase the ping timeout threshold (e.g., discovery.zen.ping.timeout in older ES versions) beyond the default 3 seconds to accommodate network fluctuations.
Role Segregation: Decouple master responsibilities from data operations.
Master-eligible nodes: node.master: true, node.data: false
Data nodes: node.master: false, node.data: true
Heap Optimization: Configure JVM options (-Xms and -Xmx) in jvm.options to allocate 50% of the available physical memory to the Elasticsearch process.

Tags: elasticsearch Distributed Systems Cluster Architecture Sharding Split-Brain

Posted on Tue, 02 Jun 2026 17:07:12 +0000 by newburcj

Freaks City