Getting Started with InfluxDB: Data Ingestion and Management

To interact with the InfluxDB CLI, authenticate using your credentials:

influx -username root -password 123456

Create a new database named mydb:

CREATE DATABASE mydb

To view all available databases:

SHOW DATABASES

The _internal database is reserved for InfluxDB's internal monitoring data. Most InfluxQL operations require a specific database context. While you can specify the database in each query, the CLI offers a convenient USE <db-name> command to set the current database for subsequent commands. For example:

USE mydb

All subsequent operations will now target the mydb database.

Understanding Time Series Data

InfluxDB stores data as time series, which are collections of data points, each representing a metric value (e.g., CPU load, temperature). Each data point consists of:

time: A timestamp.
measurement: The name of the metric (e.g., cpu_load).
field: At least one key-value pair representing the metric's value (e.g., value=0.64).
tag: Zero or more key-value pairs providing metadata about the metric (e.g., host=server01, region=EMEA).

Conceptually, measurement is analogous to an SQL table, with the timestamp acting as the primary index. tags are indexed columns, while fields are not. Unlike traditional databases, InfluxDB supports millions of measurements without requiring a predefined schema, and it does not store null values.

Data is ingested following this line protocol:

<measurement>[,<tag-key>=<tag-value>...] <field-key>=<field-value>[,<field2-key>=<field2-value>...] [unix-nano-timestamp]

Here are examples of data points:

cpu,host=serverA,region=us_west value=0.64
payment,device=mobile,product=Notepad,method=credit billed=33,licenses=3i 1434067467100293230
stock,symbol=AAPL bid=127.46,ask=127.48
temperature,machine=unit42,type=assembly external=25,internal=37 1434067467000000000

To insert a single data point using the CLI:

INSERT cpu,host=serverA,region=us_west value=0.64

This command writes a data point to the cpu measurement with tags host=serverA and region=us_west, and a field value=0.64. If a timestamp is omited, InfluxDB assigns the current server time.

To retrieve this data:

SELECT * FROM cpu

Inserting data with multiple fields:

INSERT temperature,machine=unit42,type=assembly external=25,internal=37

To select all fields and tags:

SELECT * FROM temperature

InfluxQL supports advanced features, including Go-style regular expressions:

Select the first record from all measurements:
```
SELECT * FROM /.*/ LIMIT 1
```
Select all fields from a specific measurement:
```
SELECT * FROM "cpu_load_short"
```
Filter data within a measurement based on a condition:
```
SELECT * FROM "cpu_load_short" WHERE "value" > 0.9
```

Data Sampling and Retention

InfluxDB can process hundreds of thousands of data points per second. Long-term storage of high-resolution data can be resource-intansive. Data sampling allows for storing raw, high-resolution data for shorter periods and aggregated, lower-resolution data for longer durations.

InfluxDB provides two key features for managing data lifecycle: Continuous Queries (CQs) for data aggregation and Retention Policies (RPs) for data expiration.

Continuous Query (CQ): An InfluxQL query that runs automatically and periodically within the database. CQs must include an aggregation function and a GROUP BY time() clause.

Retention Policy (RP): A component of InfluxDB's data schema that defines how long data is kept. InfluxDB compares data timestamps against the DURATION defined in an RP and removes older data. A database can have multiple RPs, but each RP is unique to its database.

Data Aggregation Example

Consider tracking restaurant order volumes from phone and website sources at 10-second intervals. This data will be stored in the food_data database, with the orders measurement and phone, website fields.

Objective:

Aggregate 10-second interval data into 30-minute intervals.
Automatically delete raw 10-second data older than two hours.
Automatically delete 30-minute aggregated data older than 52 weeks.

Database Preparation

Set up CQs before writing data, as they only apply to recent data. The FOR clause (or GROUP BY time() interval if FOR is absent) determines the lookback window.

Create Database:
```
CREATE DATABASE "food_data"
```
Create a Default 2-Hour RP: This RP will be used for raw data. If no RP is specified during data ingestion, InfluxDB uses the default.
```
CREATE RETENTION POLICY "two_hours" ON "food_data" DURATION 2h REPLICATION 1 DEFAULT
```
This creates an RP named two_hours for food_data that retains data for 2 hours and is set as the default. REPLICATION 1 is required for single-node instances. Note: Upon database creation, InfluxDB automatically generates an autogen RP with infinite retention, which becomes the default. The CREATE RETENTION POLICY command replaces autogen as the default for food_data.

Create a 52-Week Retention Policy

Create another RP to store aggregated data for 52 weeks. This will not be the default RP.

CREATE RETENTION POLICY "a_year" ON "food_data" DURATION 52w REPLICATION 1

This statement creates an RP named a_year for food_data with a 52-week retention period. Omitting DEFAULT ensures that two_hours remains the default RP.

Create a Continuous Query

Now, create a CQ to sample 10-second data into 30-minute intervals and store it under a different RP.

CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
  SELECT mean("website") AS "mean_website", mean("phone") AS "mean_phone"
  INTO "a_year"."downsampled_orders"
  FROM "orders"
  GROUP BY time(30m)
END

This CQ, named cq_30m, operates on food_data. It calculates the mean of website and phone fields from the orders measurement (using the default two_hours RP) every 30 minutes. The results are written to the downsampled_orders measurement within the a_year RP, with fields mean_website and mean_phone. InfluxDB executes this query retroactively for the preceding 30 minutes.

Note: The INTO "<retention_policy>"."<measurement>" syntax is used when writing to a non-default RP.

Observing the Results

With the new CQs and RPs in place, food_data starts collecting data. After some time and data ingestion, you will observe two measurements: orders and downsampled_orders.

> SELECT * FROM "orders" LIMIT 5
name: orders
--------- time phone website
2016-05-13T23:00:00Z      10     30
2016-05-13T23:00:10Z      12     39
2016-05-13T23:00:20Z      11     56
2016-05-13T23:00:30Z      8      34
2016-05-13T23:00:40Z      17     32

> SELECT * FROM "a_year"."downsampled_orders" LIMIT 5
name: downsampled_orders
---------------------
time mean_phone mean_website
2016-05-13T15:00:00Z      12           23
2016-05-13T15:30:00Z      13           32
2016-05-13T16:00:00Z      19           21
2016-05-13T16:30:00Z      3            26
2016-05-13T17:00:00Z      4            23

The orders measurement contains 10-second interval raw data retained for 2 hours. The downsampled_orders measurement holds 30-minute aggregated data retained for 52 weeks.

Note: The earlier timestamp in downsampled_orders reflects that InfluxDB has already pruned data from orders older than the local time minus 2 hours. downsampled_orders data will be removed after 52 weeks.

Note: When querying measurements stored in non-default RPs, you must explicitly specify the RP using the "<retention_policy>"."<measurement>" format.

InfluxDB checks RPs periodically (defaulting to every 30 minutes). Data exceeding retention limits might persist between these checks. The check interval can be configured. By combining RPs and CQs, you can effectively manage data retention, keeping high-resolution data for short periods and aggregated data for extended durations.

Tags: InfluxDB Time Series Data Data Ingestion Continuous Queries Retention Policies

Posted on Mon, 11 May 2026 00:32:22 +0000 by discomatt

Freaks City