ElasticSearch provides a powerful JSON-based query DSL (Domain-Specific Language) for executing searches. Understanding this query language is essential for anyone working with ElasticSearch, much like knowing SQL is necessary for relational databases.
Query DSL Structure
The query DSL consists of two main types of clauses:
- Leaf Query Clauses: These search for specific values in particular fields, such as match, term, or range queries. They can be used independently.
- Compound Query Clauses: These wrap other leaf or compound queries to combine multiple queries logically (using bool or dis_max) or modify their behavior (using constant_score).
The behavior of query clauses differs based on whether they are used in a query context or a filter context.
Basic Operations
Creating Documents
ElasticSearch allows document creation with or without specifying an explicit ID:
POST products/_doc/1
{
"product_id": "P-9876",
"sku": "ABC12345",
"name": "wireless mouse",
"stock": 50,
"created_at": "2024-01-15 10:30:00"
}
When no ID is specified, ElasticSearch generates one automatically. You can inspect index settings and mappings using GET products/ or GET products/_settings and GET products/_mapping.
Creating Indexes
While ElasticSearch can create indexes automatically when documents are indexed, explicit index creation provides better control over configuration and mapping:
PUT products
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2,
"refresh_interval": "2s"
},
"mappings": {
"_doc": {
"properties": {
"product_id": { "type": "keyword" },
"sku": { "type": "keyword" },
"name": { "type": "text" },
"stock": { "type": "integer" },
"price": { "type": "double" },
"active": { "type": "boolean" },
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
Important parameters:
number_of_shards: Cannot be changed after index creationrefresh_interval: Adjustable for performnace tuningnumber_of_replicas: Should be at least 1 for production
Additional field attributes:
store: Controls whether the field is stored separatelydoc_values: Enables aggregation and sorting on the fieldindex: Determines if the field is searchable
Updating Documents
Updates can be performed using the document ID or through query-based updates:
POST products/_doc/1
{
"product_id": "P-9876",
"sku": "ABC12345",
"name": "wireless mouse",
"stock": 75,
"created_at": "2024-01-15 10:30:00"
}
Query-based update:
POST products/_update_by_query
{
"query": {
"term": {
"sku": "ABC12345"
}
},
"script": {
"source": "ctx._source['stock'] = 100"
}
}
Deleting Operations
Delete by ID:
DELETE products/1
Delete by query:
POST products/_delete_by_query
{
"query": {
"term": {
"sku": "ABC12345"
}
}
}
Delete a specific field from documents:
POST products/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.remove('stock')"
}
}
Query Examples
Match All
Retrieve all documents from the cluster:
GET _search
{
"query": {
"match_all": {}
}
}
Query all documents in a specific index:
GET products/_doc/_search
Retrieve a specific document by ID:
GET products/_doc/1
Term Query
Use term queries for exact matches on numbers, dates, booleans, or keyword fields:
GET products/_doc/_search
{
"query": {
"term": {
"sku": "ABC12345"
}
}
}
Multiple values using terms (similar to SQL IN):
GET products/_doc/_search
{
"query": {
"terms": {
"product_id": ["P-9876", "P-5432", "P-1111"]
}
}
}
Range Query
For numeric or date range filtering:
GET products/_doc/_search
{
"query": {
"range": {
"price": {
"gte": 10,
"lte": 100
}
}
}
}
Exists Query
Check if a field exists:
GET products/_doc/_search
{
"query": {
"exists": {
"field": "price"
}
}
}
Bool Query
Combine multiple conditions using boolean logic:
GET products/_search
{
"query": {
"bool": {
"must": {
"term": {
"active": true
}
},
"must_not": {
"term": {
"stock": 0
}
},
"should": [
{
"term": {
"name": "mouse"
}
},
{
"term": {
"name": "keyboard"
}
}
]
}
}
}
Boolean operators:
- must: All conditions must match (AND)
- must_not: None of the conditions should match (NOT)
- should: At least one condition should match (OR)
Wildcard Query
Pattern matching similar to SQL LIKE:
GET products/_search
{
"query": {
"wildcard": {
"name": "*wire*"
}
}
}
Regexp Query
Regular expression pattern matching:
GET products/_search
{
"query": {
"regexp": {
"sku": "ABC[0-9]+"
}
}
}
Data Types Reference
Core types:
- text, keyword (string)
- long, integer, short, byte (integers)
- double, float (decimals)
- boolean
- date
- binary
Complex types:
- object (JSON objects)
- nested (arrays of objects)
Special types:
- geo_point (latitude/longitude)
- geo_shape (complex shapes)
- ip (IPv4/IPv6)
- join (parent-child relationships)
For most search operations requiring exact matching or sorting/aggregation, keyword type is preferred over text, which performs tokenization and consumes more storage.