Distributed Systems vs. Clustering
| Architecture Type | Analogy | Primary Benefit |
|---|---|---|
| Cluster | Multiple workers performing identical tasks | System high availability and load distribution |
| Distributed | Multiple workers performing distinct specialized tasks | Compute/storage scaling, decoupling, performance acceleration |
Elasticsearch inherently abstracts the complexities of distributed systems, offering native clustering capabilities.
Core Cluster Terminology
- Cluster: A collection of nodes united under a shared cluster name.
- Node: A single running instance of Elasticsearch within a cluster.
- Index: The logical namespace for data storage, analogous to a database in relational systems.
- Shard: A subdivided segment of an index. Shards can be distributed across various nodes in a cluster.
- Primary Shard: The original data partition from which replicas are derived.
- Replica Shard: A redundant copy of a primary shard, ensuring fault tolerance and read throughput.
Kibana Cluster Configuration
When monitoring a multi-node cluster, Kibana must be configured to point to all available Elasticsearch instances.
# Configuration: kibana-cluster.yml
server.port: 6601
server.host: "0.0.0.0"
server.name: "production-cluster-dashboard"
i18n.locale: "en"
elasticsearch.hosts: ["http://host1:9200", "http://host2:9200", "http://host3:9200"]
elasticsearch.requestTimeout: 120000
Shard Allocation
By default, an index is created with one primary shard and one replica. Custom shard configurations are defined within the settings block during index creation.
PUT app_data_store
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"properties": {
"user_alias": {
"type": "text"
}
}
}
}
The above configuration creates 5 primary shards and 10 replica shards, totaling 15 shards.
Auto-Rebalancing and Immutability
If a node fails, its assigned shards are automatically rebalanced to the remaining active nodes. It is critical to note that the number of primary shards is immutable once an index is created.
Shard Sizing Best Practices
- Keep individual shard sizes between 10GB and 30GB.
- Target shard count formula:
Node Count * 1 to 3.
Example Calculation: For 2000GB of data, aiming for 20GB per shard requires 100 shards. Using the target formula (dividing by 2), the recommended cluster size would be 50 nodes.
Document Routing Mechanism
The process of determining which shard stores a specific document is known as routing. Elasticsearch uses a deterministic hashing algorithm to ensure the same document ID always maps to the same primary shard.
Formula: shard_num = hash(document_id) % number_of_primary_shards
Split-Brain Phenomenon
In a healthy Elasticsearch cluster, a single Master node orchestrates cluster-wide operations—such as index creation, node membership tracking, and shard allocation. All nodes unanimously elect and acknowledge this single master.
Split-brain occurs when network partitions or node failures cause a divergence in master election, resulting in multiple independent master nodes governing disjointed cluster fragments.
Common Causes of Split-Brain
- Network Latency: Delayed heartbeat responses between nodes, particularly in cross-region or cloud deployments.
- Node Overload: When a node serves dual roles (Master and Data), heavy data operations can exhaust resources, causing the master process to become unresponsive (false failure).
- JVM Garbage Collection Pauses: Insufficient heap allocation triggers prolonged stop-the-world GC events, leading to temporary node unresponsiveness.
Mitigation Strategies
- Timeout Adjustment: Increase the ping timeout threshold (e.g.,
discovery.zen.ping.timeoutin older ES versions) beyond the default 3 seconds to accommodate network fluctuations. - Role Segregation: Decouple master responsibilities from data operations.
Master-eligible nodes:node.master: true,node.data: false
Data nodes:node.master: false,node.data: true - Heap Optimization: Configure JVM options (
-Xmsand-Xmx) injvm.optionsto allocate 50% of the available physical memory to the Elasticsearch process.