Effective handling of time-series data is critical for monitoring systems, logging infrastructure, and IoT telemetry. As data volume grows, storage expenses and administrative overhead often escalate. Rollup technology addresses these challenges by transforming fine-grained raw data into aggregated, high-level summaries, which optimizes storage footprints and enhances query performance.
Core Advantages
- Storage Efficiency: Compressing historical data reduces the disk space required for long-term retention.
- Query Performance: Aggregated datasets enable faster retrieval for time-based analytics.
- Operational Transparency: The system allows queries directly against the original index, ensuring that business applications remain unaffected by the backend transition to rolled-up data.
- Automated Lifecycle: Rollup tasks automate the creation of new indices, minimizing manual maintenance.
Prerequisite Requirements
To implement Rollup in Easysearch, ensure the following conditions are met:
- Lifecycle Management Plugin: The Index Lifecycle Management (ILM) module must be active.
- Temporal Fields: Source indices must contain a field formatted as a
dateto facilitate time-based bucket aggregation.
Configuring Rollup Parameters
Defining a Rollup job involves several key configuration settings:
metrics: Specifies numerical fields for aggregation (e.g.,sum,avg,min,max,value_count).attributes: Identifies specific fields to be included in the rollup without undergoing mathematical reduction.exclude: Defines a filter for fields that should be omitted entirely from the target index.filter: Restricts the subset of documents processed by the job.identity: Defines the dimension grouping. Each unique combination of these fields creates a specific bucket for the aggregated metrics.interval: Sets the granularity for time-based buckets (e.g.,5m,1h,1d).
Advanced Features in Version 1.10.0+
Recent releases have introduced significant improvements to Rollup flexibility:
- Custom Range Aggregations: Support for
date_rangeallows for flexible, non-uniform time buckets. - Wildcard Management: Jobs can now be controlled in bulk using wildcard patterns, for example:
POST _rollup/jobs/metrics_rollup_*/_start - Automated Rolling: Control index rotation based on document counts using the
rollup.max_docscluster setting. - Concurrency Optimization: The
rollup.search.max_countparameter regulates the maximum number of concurrent shard requests allowed during the job execution phase, preventing resource exhaustion on the cluster.
Implementation Example
PUT _rollup/jobs/performance_summary
{
"rollup": {
"source_index": "system_metrics_raw",
"target_index": "summary_system_{{ctx.source_index}}",
"timestamp": "observed_at",
"cron": "0 * * * * ?",
"interval": "1h",
"identity": ["host.id", "region"],
"metrics": ["cpu.usage", "memory.bytes"],
"attributes": ["environment", "service_tier"]
}
}
To enable searching across rolled-up data, activate the search feature at the cluster level:
PUT /_cluster/settings
{
"transient": {
"rollup.search.enabled": true
}
}
Once enabled, standard search and aggregation requests sent to the target index will automatically leverage the optimized Rollup data.