Elasticsearch Practical Techniques: Indexing, Querying, and Operations

A compilation of Elasticsearch usage tips distilled from a knowledge base, covering index management, mapping, query operations, filtering, aggregation, and search templates.

Index Management

Elasticsearch structures queries in JSON-like format, using keywords to invoke operations.

Creating an Index

This example creates a index with 5 primary shards and 1 replica shard. The mapping is configured for specific data types, including a vector field using the proxima_vector type for approximate nearest neighbor search.

PUT sample_index
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "index.codec": "proxima",
    "index.vector.algorithm": "hnsw"
  },
  "mappings": {
    "properties": {
      "record_id": { "type": "text" },
      "created_at": { "type": "text" },
      "updated_at": { "type": "text" },
      "label": { "type": "text" },
      "query_id": { "type": "text" },
      "category": { "type": "text" },
      "unit_id": { "type": "text" },
      "deployment": { "type": "text" },
      "service": { "type": "text" },
      "env": { "type": "text" },
      "state": { "type": "text" },
      "author": { "type": "text" },
      "editor": { "type": "text" },
      "doc_id": { "type": "text" },
      "folder_id": { "type": "text" },
      "metadata": { "type": "text" },
      "language": { "type": "text" },
      "base_language": { "type": "text" },
      "mime_type": { "type": "text" },
      "body": { "type": "text" },
      "section_kind": { "type": "text" },
      "channel": { "type": "text" },
      "faq_list": { "type": "text" },
      "suggestion": { "type": "text" },
      "embedding": {
        "type": "proxima_vector",
        "dim": 128,
        "vector_type": "float",
        "distance_method": "SquaredEuclidean"
      }
    }
  }
}

Removing an Index

DELETE sample_index

Basic Search

GET sample_index/_search
{
  "query": {
    "match": {
      "service": "messaging"
    }
  }
}

Understanding Mappings

Text Fields

Text-type fields undergo analysis (tokenization) to build inverted indexes, enabling fast full-text search. They cannot be used directly for sorting or aggregations becuase the original token order is not preserved.

Keyword Fields

A keyword field is not analyzed but indexed as a whole. It's ideal for exact matching, sorting, and aggregations.

For fields that require both full-text search and sorting, a multi-field mapping can be used:

PUT sample_index
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "index.codec": "proxima",
    "index.vector.algorithm": "hnsw"
  },
  "mappings": {
    "properties": {
      "record_id": { "type": "text" },
      "created_at": {
        "type": "text",
        "fields": {
          "sortable": { "type": "keyword" }
        },
        "fielddata": true
      },
      "updated_at": {
        "type": "text",
        "fields": {
          "sortable": { "type": "keyword" }
        },
        "fielddata": true
      },
      "label": { "type": "text" },
      "query_id": { "type": "text" },
      "category": { "type": "text" },
      "unit_id": { "type": "text" },
      "deployment": { "type": "text" },
      "service": { "type": "text" },
      "env": { "type": "text" },
      "state": { "type": "text" },
      "author": { "type": "text" },
      "editor": { "type": "text" },
      "doc_id": { "type": "text" },
      "folder_id": { "type": "text" },
      "metadata": { "type": "text" },
      "language": { "type": "text" },
      "base_language": { "type": "text" },
      "mime_type": { "type": "text" },
      "body": { "type": "text" },
      "section_kind": { "type": "text" },
      "channel": { "type": "text" },
      "faq_list": { "type": "text" },
      "suggestion": { "type": "text" },
      "embedding": {
        "type": "proxima_vector",
        "dim": 128,
        "vector_type": "float",
        "distance_method": "SquaredEuclidean"
      }
    }
  }
}

Operational Tips

Data Synchronization

  • Use DataX for batch or streaming synchronization when the source schema maps directly to the index mapping.
  • For transformation logic: Source → DataHub → Flink → Elasticsearch.

Changing Mappings

After an index is created, field types are immutable.

Option 1: Add a New Field

Introduce a new field with the corrected type and deprecate the old one. Transportation of data between the27 two requires application-side coordination.

Option 2: Alias Swap (Reindex)

ovia an alias provides a logical name. The steps involve:

  1. Create a new index index_v2 with the correct mapping. 2.140606 Reindex data from the original index index_v1.
  2. Atomically switch the alias to point to the new index.
POST _reindex
{
  "source": {
    "index": "index_v1"
  },
  "dest": {
    "index": "index_v2"
  }
}

POST _aliases
{
  "actions": [
    { "remove": { "index": "index_v1", "alias": "active_index" } },
    { "add": { "index": "index_v2", "alias": "active_index" } }
  ]
}

Important: During the switch with live ingestion,2904 there's a risk of duplicated or missing documents. Perform the operation during low traffic and have mechanisms to replay offsets.

Conditional Deletion

POST sample_index/_delete_by_query
{
  "query": {
    "match": {
      "service": "messaging"
    }
  }
}

Core Query Operations

All example assume an index users.

Match All Documents

GET users/_search
{
  "query": { "match_all": {} }
}

Limit the Result Count

GET users/_search
{
  "query": { "match_all": {} },
  "size": 1
}

Pagination with From/Size

GET users/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "years_of_experience": { "order": "desc" } }
  ],
  "size": 2,
  "from": 1
}

Default maximum for from + size is 10,000; use scroll or search_after for deep pagination.

Select Specific Fields

GET users/_search
{
  "query": { "match_all": {} },
  "_source": ["full_name", "years_of_experience"]
}

Term-Level Query

GET users/_search
{
  "query": {
    "term": {
      "city.keyword": "Shanghai"
    }
  }
}

Range Query

GET users/_search
{
  "query": {
    "range": {
      "years_of_experience": {
        "gte": 5,
        "lte": 20
      }
    }
  }
}

Prefix Search

GET users/_search
{
  "query": {
    "prefix": {
      "biography": "data"
    }
  }
}

Wildcard Patterns

GET users/_search
{
  "query": {
    "wildcard": {
      "biography": "data*"
    }
  }
}

Avoid leading wildcards for performance; consider using an n-gram tokenizer approach.

Fetch by Multiple IDs

GET users/_search
{
  "query": {
    "ids": {
      "values": ["abc123", "def456"]
    }
  }
}

Fuzzy Search

GET users/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "programer"
      }
    }
  }
}

The allowed edit distance depends on term length (0-2).

Boolean Compound Query

GET users/_search
{
  "query": {
    "bool": {
      "must": [
        { "range": { "years_of_experience": { "gte": 10 } } }
      ],
      "must_not": [
        { "wildcard": { "biography": "arch?" } }
      ]
    }
  },
  "sort": [
    { "years_of_experience": { "order": "desc" } }
  ]
}

Result Highlighting

GET users/_search
{
  "query": { "term": { "biography": "redis" } },
  "highlight": {
    "fields": { "*": {} }
  }
}

Customize tags:

GET users/_search
{
  "query": { "term": { "biography": "spring" } },
  "highlight": {
    "pre_tags": ["<strong>"],
    "post_tags": ["</strong>"],
    "fields": { "*": {} }
  }
}

Multi-Field Match

GET users/_search
{
  "query": {
    "multi_match": {
      "query": "innovation",
      "fields": ["full_name", "biography"]
    }
  }
}

Query String Syntax

GET users/_search
{
  "query": {
    "query_string": {
      "query": "global innovation",
      "analyzer": "ik_max_word",
      "fields": ["full_name", "biography"]
    }
  }
}

Exact Phrase Match

GET users/_search
{
  "query": {
    "match_phrase": {
      "biography": "machine learning"
    }
  }
}

Filtering for Performance

Filters do not calculate relevance scores and are cacheable. Combine filters with queries inside a bool context.

GET users/_search
{
  "query": {
    "bool": {
      "must": [
        { "match_all": {} }
      ],
      "filter": [
        { "range": { "years_of_experience": { "gte": 10 } } }
      ]
    }
  }
}

Filter types include term, terms, range, exists, and ids, mirroring query syntax.

Sorting

PUT events
{
  "mappings": {
    "properties": {
      "event_time": { "type": "date" },
      "type": { "type": "keyword" },
      "class": { "type": "keyword" },
      "severity": { "type": "integer" }
    }
  }
}

GET events/_search
{
  "sort": [
    { "event_time": { "order": "asc" } },
    "type",
    { "severity": { "order": "desc" } },
    "_score"
  ],
  "query": { "term": { "type": "error" } }
}

Nested field sorting requires nested context and a mode like avg or max.

Aggregation120 Frames

Store products data in an index inventory:

PUT inventory
{
  "mappings": {
    "properties": {
      "cost": { "type": "long" },
      "color": { "type": "keyword" },
      "manufacturer": { "type": "keyword" },
      "purchase_date": { "type": "date" }
    }
  }
}

Terms Aggregation

GET inventory/_search
{
  "aggs": {
    "by_manufacturer": {
      "terms": {
        "field": "manufacturer",
        "size": 3,
        "order": { "_count": "asc" },
        "min_doc_count": 2,
        "include": ["Nike", "Adidas"]
      }
    }
  }
}

Filter Aggregation

GET inventory/_search
{
  "aggs": {
    "nike_products": {
      "filter": { "term": { "manufacturer": "Nike" } },
      "aggs": {
        "avg_price": { "avg": { "field": "cost" } }
      }
    }
  }
}

Filters Aggregation

GET inventory/_search
{
  "aggs": {
    "selective": {
      "filters": {
        "filters": {
          "red_items": { "match": { "color": "red" } },
          "nike_brand": { "match": { "manufacturer": "Nike" } }
        }
      }
    }
  }
}

Histogram

GET inventory/_search
{
  "aggs": {
    "price_distribution": {
      "histogram": {
        "field": "cost",
        "interval": 5000,
        "min_doc_count": 1
      }
    }
  }
}

Range

GET inventory/_search
{
  "aggs": {
    "clusters": {
      "range": {
        "field": "cost",
        "ranges": [
          { "key": "budget", "to": 50 },
          { "key": "mid", "from": 50, "to": 150 },
          { "key": "premium", "from": 150 }
        ]
      }
    }
  }
}

Date Histogram

GET inventory/_search
{
  "aggs": {
    "monthly_activity": {
      "date_histogram": {
        "field": "purchase_date",
        "interval": "month",
        "format": "yyyy-MM"
      }
    }
  }
}

Date Range

GET inventory/_search
{
  "aggs": {
    "date_bucket": {
      "date_range": {
        "field": "purchase_date",
        "format": "yyyy",
        "ranges": [
          { "from": "2023", "to": "2025" }
        ]
      }
    }
  }
}

Search Templates

Search templates allow parameterized query definitions, promoting reusability and reducing redundancy across teams.

Inline Parameterization

GET blog/_search/template
{
  "inline": {
    "query": {
      "match": {
        "{{column}}": "{{term}}"
      }
    }
  },
  "params": {
    "column": "text",
    "term": "elastic"
  }
}

toJson Helper

GET blog/_search/template
{
  "inline": "{\"query\": {\"match\": {{#toJson}}conditions{{/toJson}}}}",
  "params": {
    "conditions": {
      "text": "tutorial"
    }
  }
}

join Helper

GET blog/_search/template
{
  "inline": {
    "query": {
      "match": {
        "title": "{{#join delimiter=' '}}keywords{{/join delimiter=' '}}"
      }
    }
  },
  "params": {
    "keywords": ["guide", "advanced"]
  }
}

Default Values

GET blog/_search/template
{
  "inline": {
    "query": {
      "range": {
        "visits": {
          "gte": "{{min}}",
          "lte": "{{max}}{{^max}}50{{/max}}"
        }
      }
    }
  },
  "params": {
    "min": 10
  }
}

Conditional Blocks (File-Based)

Create a script file scripts/conditional.mustache:

{
  "query": {
    "bool": {
      "must": {
        "match": { "line": "{{text}}" }
      },
      "filter": {
        {{#line_no}}
        "range": {
          "line_no": {
            {{#start}}"gte": "{{start}}"{{#end}},{{/end}}{{/start}}
            {{#end}}"lte": "{{end}}"{{/end}}
          }
        }
        {{/line_no}}
      }
    }
  }
}

aletemplate with file reference:

GET my_index/_search/template
{
  "file": "conditional",
  "params": {
    "text": "search",
    "line_no": true,
    "start": 1,
    "end": 10
  }
}

alertemplates simplify complex, reusable queries across Teams, reducing duplication and maintenance overhead.

Tags: elasticsearch search indexing query DSL aggregations

Posted on Sat, 16 May 2026 20:45:14 +0000 by zebrax