Seamless Elasticsearch Cluster Migration to Cloud Infrastructure

Overview

A client required migrating an existing Elasticsearch cluster to a mobile cloud environment while ensuring minimal service interruption during the transition process.

Migration Strategy

The approach leverages a proven gateway solution (INFINI Gateway) to implement dual-write functionality. During the cluster switching process, all data modifications are captured locally by the gateway. Once the full data migration completes, incremental changes are synchronized, followed by comprehensive validation to ensure data consistency before traffic cutover.

Migration Workflow

The complete migration procedure consists of the following steps:

  1. Application traffic is directed through the gateway with dual-write enabled. All new modifications are recorded locally but temporarily halted from flowing to the mobile cloud environment.
  2. Disable incremental data consumption on the mobile cloud side of the gateway.
  3. Migrate November data by creating a snapshot and uploading it to S3.
  4. Download the S3 files to the mobile cloud environment.
  5. Restore the snapshot to the November indices on the mobile cloud cluster.
  6. Enable incremental data consumption on the mobile cloud side of the gateway.
  7. Wait for incremental catch-up to complete.
  8. Perform validation based on time conditions (e.g., timestamp A, 30 minutes before current time), including document count verification and hash checks.
  9. Suspend business writes, gateway writes, and Tencent Cloud writes (10 minutes).
  10. Wait for remaining incremental data to finish syncing.
  11. Validate incremental data after timestamp A.
  12. Switch all traffic to the mobile cloud; applications will directly access the mobile cloud ES cluster.

Overall migration timeline:

  1. November backup time (30 minutes) - starting on the 19th
  2. Backup download to mobile cloud (2-3 days)
  3. Backup restoration to mobile cloud cluster (30 minutes)
  4. November incremental backup (20 minutes) (dual-write initiated) - 21st
  5. November incremental download to mobile cloud (6 hours)
  6. November incremental restoration time (20 minutes)
  7. Incremental data catch-up (8 hours of data generated, requiring 1 hour to sync)
  8. Validation and comparison (1 hour for existing data)
  9. Traffic suspension, incremental validation (10 minutes)
  10. Switchover (1 minute)

The workflow is illustrated below:

Cluster Specifications

  1. ES version 7.10.1
  2. 2 hot nodes, 3 warm nodes, total storage 1.9 TB
  3. 1041 indices, 2085 shards
  4. No custom plugins installed
  5. Uses update_by_query functionality
  6. Uses delete_by_query functionality
  7. Throughput untested; current daily document growth of 10+ million, target growth exceeding 100 million per day

Operational Guide (Reference)

Environment Setup

  • Self-hosted ES 5.4.2
  • Self-hosted ES 5.6.8
  • Self-hosted ES 7.5.0
  • INFINI Gateway server 1
  • INFINI Gateway server 2
  • Cloud load balancer 1 (listening on port 9200, forwarding to gateway servers 1/2 port 8000)
  • Cloud load balancer 2 (listening on port 9200, forwarding to gateway servers 1/2 port 8001)

Scenario Description

Multiple self-hosted Elasticsearch clusters need smooth migration to mobile cloud without service interruption and without code modifications.

Data Architecture

By routing application traffic through the gateway, requests are synchronously forwarded to the self-hosted ES while the gateway records all write operations and ensures they are replayed in the same order on the cloud ES. The architecture handles various failure scenarios on both clusters, enabling transparent dual-write for secure and seamless data migration.

If the application tier is already deployed on the cloud, cloud SLB services can be used to access the gateway, ensuring high availability of the backend gateway. If the application tier and INFINI Gateway are still in the corporate network, the gateway's built-in Layer 4 floating IP capability can be used to ensure gateway high availability.

Data Specifications

This section demonstrates the migration from self-hosted cluster 5.4.2 to cloud-based 5.6.16. Each execution step is described sequentially.

Execution Steps

Deploy INFINI Gateway

To ensure seamless and transparent data migration, INFINI Gateway is deployed to handle dual-write operations.

  1. System Optimization

    Refer to the relevant documentation for optimization settings.

  2. Download the Package

[root@test-server ~]# mkdir /opt/gateway
[root@test-server ~]# cd /opt/gateway/
[root@test-server gateway]# wget http://release.infinilabs.com/gateway/snapshot/gateway-1.6.0_SNAPSHOT-649-linux-amd64.tar.gz
--2022-05-19 10:16:25--  http://release.infinilabs.com/gateway/snapshot/gateway-1.6.0_SNAPSHOT-649-linux-amd64.tar.gz
Resolving host release.infinilabs.com (release.infinilabs.com)... 120.79.205.193
Connecting to release.infinilabs.com (release.infinilabs.com)|120.79.205.193|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7430568 (7.1M) [application/octet-stream]
Saving to: "gateway-1.6.0_SNAPSHOT-649-linux-amd64.tar.gz"

100%[==============================================================================================================================================>] 7,430,568   22.8MB/s   0.3s

2022-05-19 10:16:25 (22.8 MB/s) - saved "gateway-1.6.0_SNAPSHOT-649-linux-amd64.tar.gz" [7430568/7430568])

[root@test-server gateway]# tar vxzf gateway-1.6.0_SNAPSHOT-649-linux-amd64.tar.gz
gateway-linux-amd64
gateway.yml
sample-configs/
sample-configs/elasticsearch-with-ldap.yml
sample-configs/indices-replace.yml
sample-configs/record_and_play.yml
sample-configs/cross-cluster-search.yml
sample-configs/kibana-proxy.yml
sample-configs/elasticsearch-proxy.yml
sample-configs/v8-bulk-indexing-compatibility.yml
sample-configs/use_old_style_search_response.yml
sample-configs/context-update.yml
sample-configs/elasticsearch-route-by-index.yml
sample-configs/hello_world.yml
sample-configs/entry-with-tls.yml
sample-configs/javascript.yml
sample-configs/log4j-request-filter.yml
sample-configs/request-filter.yml
sample-configs/condition.yml
sample-configs/cross-cluster-replication.yml
sample-configs/fast-bulk-indexing.yml
sample-configs/es_migration.yml
sample-configs/index-docs-diff.yml
sample-configs/rate-limiter.yml
sample-configs/async-bulk-indexing.yml
sample-configs/elasticssearch-request-logging.yml
sample-configs/router_rules.yml
sample-configs/auth.yml
sample-configs/index-backup.yml


Copy the provided sample configuration from the gateway and modify it according to your actual cluster information:

[root@test-server gateway]# cp sample-configs/cross-cluster-replication.yml migration-config.yml


First, update the cluster identity information. Then modify the cluster registration details as needed. Adjust the gateway listening port and whether to enable TLS based on your requirements (if application clients access ES via HTTP protocol, set entry.tls.enabled to false).

Different clusters can use separate configurations on different ports for segregated business access.

  1. Start the Gateway

Launch the gateway with the newly created configuration:

[root@test-server gateway]# ./gateway-linux-amd64 -config migration-config.yml

   ___   _   _____  __  __    __  _
  / _ \ /_\ /__   \/__\/ / /\ \ \/_\ /\_/\
 / /_\///_\\  / /\/_\  \ \/  \/ //_\\\_ _/
/ /_\\/  _  \/ / //__   \  /\  /  _  \/ \
\____/\_/ \_/\/  \__/    \/  \/\_/ \_/\_/

[GATEWAY] A light-weight, powerful and high-performance elasticsearch gateway.
[GATEWAY] 1.6.0_SNAPSHOT, 2022-05-18 11:09:54, 2023-12-31 10:10:10, 73408e82a0f96352075f4c7d2974fd274eeafe11
[05-19 13:35:43] [INF] [app.go:174] initializing gateway.
[05-19 13:35:43] [INF] [app.go:175] using config: /opt/gateway/migration-config.yml.
[05-19 13:35:43] [INF] [instance.go:72] workspace: /opt/gateway/data1/gateway/nodes/ca2tc22j7ad0gneois80
[05-19 13:35:43] [INF] [app.go:283] gateway is up and running now.
[05-19 13:35:50] [INF] [actions.go:358] elasticsearch [primary] is available
[05-19 13:35:50] [INF] [api.go:262] api listen at: http://0.0.0.0:2900
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [backup] hosts: [] => [es-cn-tl32p9fkk0006m56k.elasticsearch.aliyuncs.com:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [backup] hosts: [] => [es-cn-tl32p9fkk0006m56k.elasticsearch.aliyuncs.com:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [entry.go:322] entry [my_es_entry/] listen at: https://0.0.0.0:8000
[05-19 13:35:50] [INF] [module.go:116] all modules are started


  1. Run in Background
[root@test-server gateway]# nohup ./gateway-linux-amd64 -config migration-config.yml &


  1. Apply License
curl -XPOST http://localhost:2900/_license/apply -d'
{
"license": "XXXXXXXXXXXXXXXXXXXXXXXXX"
}'


Deploy INFINI Console

For convenient management and quick switching between multiple clusters, INFINI Console is deployed.

  1. Download and Install
[root@test-server console]# wget http://release.infinilabs.com/console/snapshot/console-0.3.0_SNAPSHOT-596-linux-amd64.tar.gz
--2022-05-19 10:57:24--  http://release.infinilabs.com/console/snapshot/console-0.3.0_SNAPSHOT-596-linux-amd64.tar.gz
Resolving host release.infinilabs.com (release.infinilabs.com)... 120.79.205.193
Connecting to release.infinilabs.com (release.infinilabs.com)|120.79.205.193|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13576234 (13M) [application/octet-stream]
Saving to: "console-0.3.0_SNAPSHOT-596-linux-amd64.tar.gz"

100%[==============================================================================================================================================>] 13,576,234  33.2MB/s   0.4s

2022-05-19 10:57:25 (33.2 MB/s) - saved "console-0.3.0_SNAPSHOT-596-linux-amd64.tar.gz" [13576234/13576234])

[root@test-server console]# tar vxzf console-0.3.0_SNAPSHOT-596-linux-amd64.tar.gz
console-linux-amd64
console.yml


  1. Modify Configuration
[root@test-server console]# cat console.yml

# for the system cluster, please use Elasticsearch v7.3+
elasticsearch:
  - name: default
    enabled: true
    monitored: false
    endpoint: http://es-cn-xxxxxxxxxxxxxx.com:9200
    basic_auth:
      username: elastic
      password: XXXXXX
    discovery:
      enabled: false
 ...


  1. Start the Service
[root@test-server console]# ./console-linux-amd64 -service install
Success
[root@test-server console]# ./console-linux-amd64 -service start
Success


  1. Access the Management Interface

Access port 9000 on the host to open the Console interface: http://x.x.x.x:9000/
Navigate to menu [System] [Cluster] to register the Elasticsearch clusters and gateway addresses you need to manage for quick administration.

Test INFINI Gateway

To verify the gateway is functioning properly, we can quickly test it through INFINI Console.
First, create an index and write a document through the gateway interface.
Then check the data on the 5.4.2 cluster.
Finally, check the data on the 5.6.16 cluster.
If all checks show the expected results, the gateway configuration is verified and working correctly.

Adjust Gateway Consumption Policy

Since we need to pause incremental data consumption until the full data migration completes, modify the Pipeline configurations consume-queue_backup-to-backup and consume-queue_primary-failure-to-backup by setting the auto_start parameter to false, preventing automatic task startup.

After modifying the configuration, restart the gateway.
For easier management, use INFINI Console to register and manage gateways.

Once the full migration completes, use the backend Task management to start or stop subsequent tasks.

Switch Traffic

Next, switch the business write traffic to the gateway, which means changing the address previously pointing to ES 5.4.2 to point to the gateway address. If the 5.4.2 cluster has authentication enabled, the application code should also pass the authentication credentials, maintaining the same usage pattern as before with 5.4.2.

After switching traffic to the gateway, user requests still access the self-hosted cluster synchronously, while the gateway records requests in sequence to the MQ, but consumption remains paused. If the application's ES SDK supports Sniff and the application code has Sniff enabled, Sniff should be disabled to prevent the application from directly connecting to backend ES nodes via Sniff. All traffic should now flow exclusively through the gateway.

Full Data Migration

After traffic migrates to the gateway, we begin migrating the self-hosted Elasticsearch cluster data to the cloud Elasticsearch cluster.

There are multiple approaches for full migration of existing data:

  • Restoration via snapshots
  • Export/import using tools such as ESM

If there are many indices, migrate them sequentially. Also ensure Mapping and Settings are imported beforehand.
Using the current 5.4 cluster indices as an example, the indices to migrate are demo_5_4_2, containing only 4 documents.

We use the gateway's built-in migration feature. Copy the sample file:

[root@test-server gateway]# cp sample-configs/es_migration.yml migration-job.yml


Modify the cluster and index configuration. You can configure whether to rename indices and unify Type (for cross-version Type unification).

Create templates and indices. If the target cluster does not allow dynamic document creation, indices must be created in advance.

Now initiate the data migration. Execute the gateway program with the defined configuration.

After completion, verify the data.

The full data import is now complete.

Incremental Data Migration

During the full import, there may be incremental modifications to the data. These requests have been completely recorded. We only need to enable the gateway's consumption tasks to apply the accumulated requests to the cloud Elasticsearch cluster.

If we check the 5.6 cluster at this point, these modifications have not yet been synchronized.

However, these incremental data changes have been completely recorded by the gateway. We just need to enable the gateway's incremental consumption tasks.

Monitor whether the queue consumption is complete to determine if incremental data sync is finished.

Now let's check the 5.6 cluster data.

The incremental data updates have now arrived.

Perform Data Comparison

Since cluster data may be extensive, a complete comparison is required to ensure data integrity. Use the gateway's built-in data comparison tool. Copy the sample file:

[root@test-server gateway]# cp sample-configs/index-docs-diff.yml data-validation.yml


Modify the cluster and index information to compare. You can add filter conditions like time range windows for incremental Diff.

Execute the gateway program with this configuration.

As shown in the diagram, both clusters are completely consistent.

Switch Cluster

If verification confirms both clusters' data is fully consistent, the application can switch to the new cluster. Alternatively, swap the primary and standby in the gateway configuration to write to the 5.6 cluster synchronously.

Run both clusters online for a period. After the business fully validates, safely decommission the old cluster. If issues arise, you can immediately switch back to the old cluster.

Summary

Using INFINI Gateway, self-hosted ES clusters can be securely and seamlessly migrated to mobile cloud ES. During migration, the two clusters are decoupled through the gateway, allowing different versions between the two clusters. The migration process can also achieve seamless version upgrades.

For more information, visit the official documentation: https://infinilabs.com/docs/latest/gateway

Tags: elasticsearch Data Migration INFINI Gateway dual-write cloud migration

Posted on Tue, 16 Jun 2026 17:58:52 +0000 by brian79