Verifying Elasticsearch Data Integrity with INFINI Gateway

After migrating data between clusters, verifying document counts is often insufficient to guarantee data integrity. To perform a robust, content-level comparison betwean two Elasticsearch, Easysearch, or OpenSearch clusters, ENFINI Gateway provides a highly efficient solution for verifying that every document is identical across source and destination environments.

Environment Setup

Software Version
Easysearch 1.12.0
Elasticsearch 7.17.29
INFINI Gateway 1.29.2

Implementation Steps

The verification process involves two phases: configuring the gateway environment and executing the comparison pipeline.

1. Configuring the Gateway Pipeline

Begin by downloading the standard template from the official repository and modifying the environment variables to target your specific clusters.

# Configuration variables for cluster endpoints
env:
 GATEWAY_ADDR: 127.0.0.1:8001
 API_ADDR: 127.0.0.1:9000
 CLUSTER_A: http://127.0.0.1:9200
 CLUSTER_B: http://127.0.0.1:9201
 TASK_NAME: verify_index_content

Next, define the cluster authentication parameters within the configuration file to allow the gateway to access both the source and target nodes.

elasticsearch:
 - name: primary_cluster
   enabled: true
   endpoints: ["$[[env.CLUSTER_A]]"]
   basic_auth:
     username: elastic
     password: your_password

 - name: replica_cluster
   enabled: true
   endpoints: ["$[[env.CLUSTER_B]]"]
   basic_auth:
     username: admin
     password: your_password

Define the processing pipeline to extract and compare document hashes. The dump_hash processor reads documents from both clusters and verifies they consistency.

pipeline:
 - name: integrity_check
   auto_start: true
   keep_running: false
   processor:
   - dag:
       mode: wait_all
       parallel:
         - dump_hash:
             sort_document_fields: true
             indices: "logs_index,user_data"
             batch_size: 500
             elasticsearch: "primary_cluster"
             output_queue: "queue_a"
         - dump_hash:
             indices: "logs_index,user_data"
             batch_size: 500
             elasticsearch: "replica_cluster"
             output_queue: "queue_b"

2. Running the Verification Task

Execute the comparison tool using the gateway binary. The gateway will process the streams, calculate document signatures, and identify any discrepancies between the two indices.

./gateway-bin -config verify-config.yml

Upon completion, the gateway will report whether the document content matches across the specified indices. If discrepancies exist, the tool can be further configured to log specific IDs for targeted debugging.

Tags: elasticsearch INFINI Gateway Easysearch OpenSearch Data Migration

Posted on Sat, 30 May 2026 19:27:32 +0000 by smeee_b