Test Environment
- Primary cluster: http://10.0.1.2:9200, username: elastic, password: ***, 9 nodes, hardware specs: 12C64GB (31GB JVM)
- Secondary cluster: http://10.0.1.15:9200, username: elastic, password: ***, 9 nodes, hardware specs: 12C64GB (31GB JVM)
- Gateway server 1 (Public IP:120.92.43.31, Internal IP:192.168.0.24) hardware specs: 40C 256GB 3.7T NVME SSD
- Load testing server 1 (Internal IP: 10.0.0.117) hardware specs: 24C 48GB
- Load testing server 2 (Internal IP: 10.0.0.69) hardware specs: 24C 48GB
Test Overview
This test primarily evaluates the practical implementation of gateway indexing acceleration and assesses the hardware specifications required to achieve different performance levels, serving as a reference for production deployment configuration.
Scenario Description
The gateway improves overall cluster write throughput by reorganizing requests according to target nodes, implementing request speed separation.
Data Description
Using Nginx data auto-generated by Loadgen as an example, we compare the speed difference between direct Elasticsearch writes and gateway-accelerated Elasticsearch writes. The data sample format is as follows:
{
"_index": "test-10",
"_type": "_doc",
"_id": "cak5emoke01flcq9q760",
"_source": {
"batch_number": "2328917",
"id": "cak5emoke01flcq9r19g",
"ip": "192.168.0.1",
"message": "175.10.75.216 - webmaster [29/Jul/2020:17:01:26 +0800] \"GET /rest/system/status HTTP/1.1\" 200 1838 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"",
"now_local": "2022-06-14 17:39:39.420724895 +0800 CST",
"now_unix": "1655199579",
"random_no": "13",
"routing_no": "cak5emoke01flcq9pvu0"
}
}
Data Architecture
The gateway can locally calculate the target storage location in the backend Elasticsearch cluster for each indexed document, enabling precise request定位. In a bulk request, data may exist for multiple backend nodes. The bulk_reshuffle filter is used to break down normal bulk requests and reassemble them according to target nodes or shards, preventing Elasticsearch nodes from redistributing requests after receiving them. This reduces traffic and load between Elasticsearch clusters, avoids single node bottlenecks, ensures balanced processing across data nodes, and enhances the overall indexing throughput of the cluster.
We test scenarios with both 3 shards and 30 shards.
Test Preparation
Deploying the Gateway Program
- System Tuning
Refer to the documentation at: https://gateway.infinilabs.com/zh/docs/getting-started/optimization/
- Download the Program
[root@iZbp1gxkifg8uetb33pvcoZ ~]# mkdir /opt/gateway
[root@iZbp1gxkifg8uetb33pvcoZ ~]# cd /opt/gateway/
[root@iZbp1gxkifg8uetb33pvcoZ gateway]# tar vxzf gateway-1.6.0_SNAPSHOT-649-linux-amd64.tar.gz
gateway-linux-amd64
gateway.yml
sample-configs/
sample-configs/elasticsearch-with-ldap.yml
sample-configs/indices-replace.yml
sample-configs/record_and_play.yml
sample-configs/cross-cluster-search.yml
sample-configs/kibana-proxy.yml
sample-configs/elasticsearch-proxy.yml
sample-configs/v8-bulk-indexing-compatibility.yml
sample-configs/use_old_style_search_response.yml
sample-configs/context-update.yml
sample-configs/elasticsearch-route-by-index.yml
sample-configs/hello_world.yml
sample-configs/entry-with-tls.yml
sample-configs/javascript.yml
sample-configs/log4j-request-filter.yml
sample-configs/request-filter.yml
sample-configs/condition.yml
sample-configs/cross-cluster-replication.yml
sample-configs/secured-elasticsearch-proxy.yml
sample-configs/fast-bulk-indexing.yml
sample-configs/es_migration.yml
sample-configs/index-docs-diff.yml
sample-configs/rate-limiter.yml
sample-configs/async-bulk-indexing.yml
sample-configs/elasticssearch-request-logging.yml
sample-configs/router_rules.yml
sample-configs/auth.yml
sample-configs/index-backup.yml
- Modify Configuration
Copy the sample configuration provided by the gateway and modify it according to actual cluster information:
[root@iZbp1gxkifg8uetb33pvcoZ gateway]# cp sample-configs/async-bulk-indexing.yml
Modify the cluster registration information as needed. Also adjust the gateway listening port and TLS settings based on your requirements (if clients access ES via http:// protocol, set entry.tls.enabled to false). Different clusters can use different configurations, listening on different ports for separate business access.
- Start the Gateway
Start the gateway with the configuration you just created:
[root@iZbp1gxkifg8uetb33pvcoZ gateway]# ./gateway-linux-amd64 -config gateway.yml
___ _ _____ __ __ __ _
/ _ \ /_\ /__ \/__\/ / /\ \ \ \/\_\ /\_/
/ /_\///_\\ / /\/\_ \ \/ \/ /_\\\_ _/
/ /_\/\ _ \/ / //__ \ /\ / _ \/ \\
\____/\_/ \_/\/__ \__/ \/ \/\_/ \_/\_/
[GATEWAY] A light-weight, powerful and high-performance elasticsearch gateway.
[GATEWAY] 1.6.0_SNAPSHOT, 2022-05-18 11:09:54, 2023-12-31 10:10:10, 73408e82a0f96352075f4c7d2974fd274eeafe11
[05-19 13:35:43] [INF] [app.go:174] initializing gateway.
[05-19 13:35:43] [INF] [app.go:175] using config: /opt/gateway/gateway.yml.
[05-19 13:35:43] [INF] [instance.go:72] workspace: /opt/gateway/data1/gateway/nodes/ca2tc22j7ad0gneois80
[05-19 13:35:43] [INF] [app.go:283] gateway is up and running now.
[05-19 13:35:50] [INF] [actions.go:358] elasticsearch [primary] is available
[05-19 13:35:50] [INF] [api.go:262] api listen at: http://0.0.0.0:2900
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [backup] hosts: [] => [xxxxxxxx-backup:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [backup] hosts: [] => [xxxxxxxx-primary:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [entry.go:322] entry [my_es_entry/] listen at: https://0.0.0.0:8000
[05-19 13:35:50] [INF] [module.go:116] all modules are started
- Install as Service
Quickly install the gateway as a system service:
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./gateway-linux-amd64 -service install
Success
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./gateway-linux-amd64 -service start
Success
Deploying the Management Console
To facilitate quick switching between multiple clusters, we use Console for management.
- Download and Install
Simply extract the provided installation package to complete installation:
[root@iZbp1gxkifg8uetb33pvcpZ console]# tar vxzf console-0.3.0_SNAPSHOT-596-linux-amd64.tar.gz
console-linux-amd64
console.yml
- Modify Configuration
Use [http://10.0.1.2:9200](http://10.0.1.2:9200) as the Console system cluster to retain monitoring metrics and metadata information. Modify the configuration as follows:
[root@iZbp1gxkifg8uetb33pvcpZ console]# cat console.yml
elasticsearch:
- name: default
enabled: true
monitored: false
endpoint: http://10.0.1.2:9200
basic_auth:
username: elastic
password: xxxxx
discovery:
enabled: false
...
- Start Service
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./console-linux-amd64 -service install
Success
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./console-linux-amd64 -service start
Success
- Access Console
Access port 9000 on this host to open the Console backend, http://10.0.128.58:9000/#/cluster/overview. Open the [System][Cluster] menu to register the Elasticsearch clusters and gateway addresses to be managed.
- Register Gateway
Open the GATEWAY registration function and set it to the gateway's API address for management.
Testing the Gateway
To verify that the gateway is working properly, we quickly verify it through Console. First, create an index through the gateway interface and write a document:
First, check the data status of the primary cluster. Then, check the data status of the secondary cluster. Both clusters return the same data, indicating that the gateway configuration is working properly, and verification is complete.
Installing Loadgen
The test machine also needs tuning. Refer to the gateway optimization instructions.
- On the test machine, download and install Loadgen:
[root@vm10-0-0-69 opt]# tar vxzf loadgen-1.4.0_SNAPSHOT-50-linux-amd64.tar.gz
- Download an Nginx log sample and save it as `nginx.log`:
[root@vm10-0-0-69 opt]# head nginx.log
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET / HTTP/1.1\" 200 8676 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /vendor/bootstrap/css/bootstrap.css HTTP/1.1\" 200 17235 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /vendor/daterangepicker/daterangepicker.css HTTP/1.1\" 200 1700 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /vendor/fork-awesome/css/v5-compat.css HTTP/1.1\" 200 2091 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /assets/font/raleway.css HTTP/1.1\" 200 145 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /vendor/fork-awesome/css/fork-awesome.css HTTP/1.1\" 200 8401 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /assets/css/overrides.css HTTP/1.1\" 200 2524 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /assets/css/theme.css HTTP/1.1\" 200 306 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /vendor/fancytree/css/ui.fancytree.css HTTP/1.1\" 200 3456 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] \"GET /syncthing/development/logbar.js HTTP/1.1\" 200 486 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"
- Modify the Loadgen configuration file
Modify the variables to point the message to the nginx log you just prepared, and update the ES address and authentication information. Loadgen will generate write requests randomly. The specific configuration is as follows:
[root@vm10-0-0-117 opt]# cat loadgen.yml
variables:
- name: ip
type: file
path: dict/ip.txt
- name: message
type: file
path: nginx.log
- name: user
type: file
path: dict/user.txt
- name: id
type: sequence
- name: uuid
type: uuid
- name: now_local
type: now_local
- name: now_utc
type: now_utc
- name: now_unix
type: now_unix
- name: suffix
type: range
from: 10
to: 13
requests:
- request:
method: POST
runtime_variables:
batch_no: id
runtime_body_line_variables:
routing_no: uuid
basic_auth:
username: elastic
password: xxxx
url: http://10.0.128.58:8000/_bulk
body_repeat_times: 5000
body: \"{ \"create\" : { \"_index\" : \"test-$[[suffix]]\",\"_type\":\"_doc\", \"_id\" : \"$[[uuid]]\" } }\\n{ \"id\" : \"$[[uuid]]\",\"routing_no\" : \"$[[routing_no]]\",\"batch_number\" : \"$[[batch_no]]\", \"message\" : \"$[[message]]\", \"random_no\" : \"$[[suffix]]\",\"ip\" : \"$[[ip]]\",\"now_local\" : \"$[[now_local]]\",\"now_unix\" : \"$[[now_unix]]\" }\\n\"
- Start Loadgen for testing
Specify the relevant runtime parameters `-d` and concurrency parameters `-c`, and enable request compression:
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -d 60000 -c 200 --compress
__ ___ _ ___ ___ __ __
/ / /___\/\_\ / \/ _ \/__\/\ \ \
/ / // ///_\\ / /\ / /_\/_\ / \/ /
/ /__/ \_// _ \/ /_// /_\\//__/ /\ /
\____|___/\_/ \_/___,'\____/\__/\_\ \/
[LOADGEN] A http load generator and testing suit.
[LOADGEN] 1.4.0_SNAPSHOT, 2022-06-01 09:58:17, 2023-12-31 10:10:10, b6a73e2434ac931d1d43bce78c0f7622a1d08b2e
[06-14 18:47:29] [INF] [app.go:174] initializing loadgen.
[06-14 18:47:29] [INF] [app.go:175] using config: /opt/loadgen.yml.
[06-14 18:47:29] [INF] [module.go:116] all modules are started
[06-14 18:47:30] [INF] [instance.go:72] workspace: /opt/data/loadgen/nodes/cajfdg0ke012ka748j30
[06-14 18:47:30] [INF] [app.go:283] loadgen is up and running now.
[06-14 18:47:30] [INF] [loader.go:320] warmup started
[06-14 18:47:30] [INF] [loader.go:329] [POST] http://10.0.128.58:8000/_bulk -{\"took\":115,\"errors\":false,\"\":[{\"create\":{\"_index\":\"test-11\",\"_type\":\"_doc\",\"_id\":\"cak6eggke0184a2dcc70\",\"_version\":1,\"result\":\"created\",\"_shards\":{\"total\":1,\"successful\":1,\"failed\":0},\"_seq_no\":39707421,\"_primary_term\":1,\"status\":201}},{\"create\":{\"_i
[06-14 18:47:30] [INF] [loader.go:330] status: 200,,{\"took\":115,\"errors\":false,\"\":[{\"create\":{\"_index\":\"test-11\",\"_type\":\"_doc\",\"_id\":\"cak6eggke0184a2dcc70\",\"_version\":1,\"result\":\"created\",\"_shards\":{\"total\":1,\"successful\":1,\"failed\":0},\"_seq_no\":39707421,\"_primary_term\":1,\"status\":201}},{\"create\":{\"_i
[06-14 18:47:30] [INF] [loader.go:338] warmup finished
Perform the same installation operations on another load testing machine, which won't be repeated here.
Testing Methodology
Preparing Template
Create a default index template to optimize write performance:
PUT _template/test
{
\"index_patterns\": [
\"test*\"
],
\"settings\": {
\"index.translog.durability\": \"async\",
\"refresh_interval\": \"-1\",
\"number_of_shards\": 3,
\"number_of_replicas\": 0
},
\"mappings\": {
\"dynamic_templates\": [
{
\"strings\": {
\"mapping\": {
\"ignore_above\": 256,
\"type\": \"keyword\"
},
\"match_mapping_type\": \"string\"
}
}
]
}
}
Starting Load Test
Execute the load testing tool on the load testing machines respectively:
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -d 60000 -c 200 --compress
Observing Throughput
Open the Console tool to observe the cluster's throughput. Open the monitoring menu and click the dropdown at the top to quickly switch between different clusters and view the primary cluster's throughput.
Limiting CPU
To test gateway performance under different CPU resources, we use taskset to bind the process CPU:
Testing Process
Gateway Configuration
Direct ES Writing
Loadgen Configuration
[root@vm10-0-0-69 opt]# cat loadgen2.yml
statsd:
enabled: false
host: 192.168.3.98
port: 8125
namespace: loadgen.
variables:
- name: ip
type: file
path: dict/ip.txt
- name: message
type: file
path: nginx.log
- name: user
type: file
path: dict/user.txt
- name: id
type: sequence
- name: uuid
type: uuid
- name: now_local
type: now_local
- name: now_utc
type: now_utc
- name: now_unix
type: now_unix
- name: suffix
type: range
from: 10
to: 13
requests:
- request:
method: POST
runtime_variables:
batch_no: id
runtime_body_line_variables:
routing_no: uuid
basic_auth:
username: elastic
password: ####
#url: http://localhost:8000/_search?q=$[[id]]
url: http://10.0.1.2:9200/_bulk
body_repeat_times: 10000
body: \"{ \"create\" : { \"_index\" : \"test-$[[suffix]]\",\"_type\":\"_doc\", \"_id\" : \"$[[uuid]]\" } }\\n{ \"id\" : \"$[[uuid]]\",\"routing_no\" : \"$[[routing_no]]\",\"message\" : \"$[[message]]\",\"batch_number\" : \"$[[batch_no]]\", \"random_no\" : \"$[[suffix]]\",\"ip\" : \"$[[ip]]\",\"now_local\" : \"$[[now_local]]\",\"now_unix\" : \"$[[now_unix]]\" }\\n\"
Second Loadgen Configuration:
[root@vm10-0-0-117 opt]# cat loadgen2.yml
statsd:
enabled: false
host: 192.168.3.98
port: 8125
namespace: loadgen.
variables:
- name: ip
type: file
path: dict/ip.txt
- name: message
type: file
path: nginx.log
- name: user
type: file
path: dict/user.txt
- name: id
type: sequence
- name: uuid
type: uuid
- name: now_local
type: now_local
- name: now_utc
type: now_utc
- name: now_unix
type: now_unix
- name: suffix
type: range
from: 10
to: 13
requests:
- request:
method: POST
runtime_variables:
batch_no: id
runtime_body_line_variables:
routing_no: uuid
basic_auth:
username: elastic
password: ####
url: http://10.0.1.2:9200/_bulk
body_repeat_times: 5000
body: \"{ \"create\" : { \"_index\" : \"test-$[[suffix]]\",\"_type\":\"_doc\", \"_id\" : \"$[[uuid]]\" } }\\n{ \"id\" : \"$[[uuid]]\",\"routing_no\" : \"$[[routing_no]]\",\"batch_number\" : \"$[[batch_no]]\", \"message\" : \"$[[message]]\", \"random_no\" : \"$[[suffix]]\",\"ip\" : \"$[[ip]]\",\"now_local\" : \"$[[now_local]]\",\"now_unix\" : \"$[[now_unix]]\" }\\n\"
Start load testing respectively:
[root@vm10-0-0-69 opt]# ./loadgen-linux-amd64 -c 100 -d 66000 -config loadgen2.yml
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -c 100 -d 66000 -config loadgen2.yml
Direct ES writing throughput stabilizes at ~600k eps, with 3 shards per index.
Gateway 1C
Testing with gateway mode, first with default 3 shards:
Gateway 2C
Gateway 4C
Gateway 6C
Gateway 8C
Set Loadgen concurrency to 200:
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -c 200 -d 66000 -config loadgen1.yml
No performance improvement, gateway CPU not fully utilized.
Direct ES Writing - 32 Shards
Delete all and modify template to default 30 shards:
DELETE test-10
DELETE test-11
DELETE test-12
DELETE test-13
DELETE test-14
DELETE test-15
PUT _template/test
{
\"index_patterns\": [
\"test*\"
],
\"settings\": {
\"index.translog.durability\": \"async\",
\"refresh_interval\": \"-1\",
\"number_of_shards\": 30,
\"number_of_replicas\": 0
},
\"mappings\": {
\"dynamic_templates\": [
{
\"strings\": {
\"mapping\": {
\"ignore_above\": 256,
\"type\": \"keyword\"
},
\"match_mapping_type\": \"string\"
}
}
]
}
}
Continue load testing:
30 shards, direct ES stabilizes at ~750k eps.
Gateway 1C - 32 Shards
Gateway 2C - 32 Shards
Gateway 4C - 32 Shards
Gateway 6C - 32 Shards
Gateway 8C - 32 Shards
Traffic and writing are relatively large. Enable compression:
Modify message compression to disk:
Note: Enabling traffic or disk compression will incur additional overhead, and throughput will decrease to some extent.
Gateway 12C - 32 Shards
Remove compression, expand CPU to 12C, throughput unchanged, reached limit.
Shard Level
Test Results
3 shards * 4 indices, direct ES writing 600k eps.
| Gateway CPU Cores |
Throughput Capacity (events per second) |
Notes |
| Gateway 1C |
~180k |
|
| Gateway 2C |
~350k |
|
| Gateway 4C |
~650k |
|
| Gateway 6C |
~770k |
|
| Gateway 8C |
~930k |
Backend ES processing capacity nearly saturated |
30 shards * 4 indices, direct ES writing 750k eps.
| Gateway CPU Cores |
Throughput Capacity (events per second) |
Notes |
| Gateway 1C |
~200k |
|
| Gateway 2C |
~400k |
|
| Gateway 4C |
~760k |
|
| Gateway 6C |
~1000k |
Backend ES processing capacity nearly saturated |
| Gateway 8C |
~930k |
Backend ES processing capacity nearly saturated |