After migrating data between clusters, verifying document counts is often insufficient to guarantee data integrity. To perform a robust, content-level comparison betwean two Elasticsearch, Easysearch, or OpenSearch clusters, ENFINI Gateway provides a highly efficient solution for verifying that every document is identical across source and destination environments.
Environment Setup
| Software | Version |
|---|---|
| Easysearch | 1.12.0 |
| Elasticsearch | 7.17.29 |
| INFINI Gateway | 1.29.2 |
Implementation Steps
The verification process involves two phases: configuring the gateway environment and executing the comparison pipeline.
1. Configuring the Gateway Pipeline
Begin by downloading the standard template from the official repository and modifying the environment variables to target your specific clusters.
# Configuration variables for cluster endpoints
env:
GATEWAY_ADDR: 127.0.0.1:8001
API_ADDR: 127.0.0.1:9000
CLUSTER_A: http://127.0.0.1:9200
CLUSTER_B: http://127.0.0.1:9201
TASK_NAME: verify_index_content
Next, define the cluster authentication parameters within the configuration file to allow the gateway to access both the source and target nodes.
elasticsearch:
- name: primary_cluster
enabled: true
endpoints: ["$[[env.CLUSTER_A]]"]
basic_auth:
username: elastic
password: your_password
- name: replica_cluster
enabled: true
endpoints: ["$[[env.CLUSTER_B]]"]
basic_auth:
username: admin
password: your_password
Define the processing pipeline to extract and compare document hashes. The dump_hash processor reads documents from both clusters and verifies they consistency.
pipeline:
- name: integrity_check
auto_start: true
keep_running: false
processor:
- dag:
mode: wait_all
parallel:
- dump_hash:
sort_document_fields: true
indices: "logs_index,user_data"
batch_size: 500
elasticsearch: "primary_cluster"
output_queue: "queue_a"
- dump_hash:
indices: "logs_index,user_data"
batch_size: 500
elasticsearch: "replica_cluster"
output_queue: "queue_b"
2. Running the Verification Task
Execute the comparison tool using the gateway binary. The gateway will process the streams, calculate document signatures, and identify any discrepancies between the two indices.
./gateway-bin -config verify-config.yml
Upon completion, the gateway will report whether the document content matches across the specified indices. If discrepancies exist, the tool can be further configured to log specific IDs for targeted debugging.