Time Series Analysis with Pandas: Essential Techniques for Temporal Data Processing

Time Series Creation in Pandas

Pandas offers robust functionality for creating time series data through two primary approaches:

  • Using the built-in date_range function to generate time sequences with specified start/end dates and intervals
  • Converting existing date strings to DatetimeIndex objects using the to_datetime function

Creating Time Sequences with date_range

The date_range function accepts various parameters including:

  • S: seconds
  • T: minutes
  • H: hours
  • D: days
  • B: business days
  • W: weeks
  • M: months
  • Q: quarters
  • Y: years

Numbers can precede these codes for custom intervals like 2H for two hours.

import pandas as pd

# Generate sequence with default daily frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-01-04')
print(time_sequence)

# Monthly frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-05-07', freq='M')
print(time_sequence)

# Quarterly frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-11-07', freq='Q')
print(time_sequence)

The function provides additional customization options:

# Start of month frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-04-07', freq='MS')
print(time_sequence)

# Weekly frequency starting Monday
time_sequence = pd.date_range(start='2021-01-01', end='2021-01-15', freq='W-MON')
print(time_sequence)

When only one enpdoint is known:

time_sequence = pd.date_range(start='2021-01-01', freq='D', periods=7)
print(time_sequence)

time_sequence = pd.date_range(end='2021-01-07', freq='W', periods=7)
print(time_sequence)

Additional useful parameters include:

  • tz: timezone specification (default None)
  • normalize: normalize to midnight (default False)
  • name: assign name to generated sequence
  • inclusive: control inclusion of endpoints ('left', 'right', 'both', 'neither')
time_sequence = pd.date_range('2021-01-01', '2021-01-05', freq='D', tz='Asia/Tokyo')
print(time_sequence)

# Convert to UTC
print(time_sequence.tz_convert('UTC'))

Converting from Other Formats

Convert string-based dates to datetime objects:

# Load sample climate data
data_frame = pd.DataFrame({
    'temp_moscow': [12.5, 13.2, 11.8, 14.1],
    'temp_istanbul': [18.3, 19.1, 17.9, 20.2]
}, index=['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'])

# Convert index to datetime
print(data_frame.index)
data_frame.index = pd.to_datetime(data_frame.index)
print(data_frame.index)

Direct conversion examples:

print(pd.to_datetime('2021-01'))
print(pd.to_datetime(['2021-01-01', '2021-01-02']))

For non-standard formats, use the format parameter:

# Format codes: %Y (year), %m (month), %d (day), %H (hour), %M (minute), %S (second)
print(pd.to_datetime('202101', format='%Y%m'))
print(pd.to_datetime(['20211', '20212', '202112'], format='%Y%m'))

Time Series Shifting

Various methods exist for shifting time series:

time_sequence = pd.date_range('2021-01-01', '2021-01-05', freq='D')
print(time_sequence)
print(time_sequence.shift(1))          # Shift by current frequency unit
print(time_sequence.shift(2, freq='M')) # Shift by specified frequency

# Add timedelta offsets
delta_time = pd.to_timedelta(3, unit='D')
print(time_sequence + delta_time)

Time Series Data Processing

Working with temporal data requires special handling due to periodic nature and varying month lengths. Consider this example with synthetic weather data:

import numpy as np

# Create daily temperature dataset
temp_data = pd.DataFrame(index=pd.date_range('2024-01-01', '2024-12-31', freq='D'))
temp_data['berlin'] = np.random.rand(len(temp_data)) * 12 - 10
temp_data['madrid'] = np.random.rand(len(temp_data)) * 12 + 8
temp_data['athens'] = np.random.rand(len(temp_data)) * 12 + 20
temp_data['london'] = np.random.rand(len(temp_data)) * 12 + 5
temp_data['paris'] = np.random.rand(len(temp_data)) * 12 + 3

print(temp_data.head())

Basic statistical operations:

print(temp_data.mean())  # Overall averages
print(temp_data.min())   # Minimum values
print(temp_data.max())   # Maximum values

Resampling provides different temporal aggregations:

print(temp_data.resample('M').mean())     # Monthly averages
print(temp_data.resample('QS-DEC').mean()) # Seasonal averages

For precipitation-like data where totals matter more than averages:

precip_data = temp_data.copy() / 8
precip_data[precip_data < 0] = 0

print(precip_data.resample('Y').sum())  # Annual totals

Handling monthly data with varying day counts:

monthly_precip = pd.DataFrame(index=pd.date_range('2024-01-01', periods=12, freq='M'))
monthly_precip['seoul'] = np.random.rand(monthly_precip.shape[0]) * 18
monthly_precip['tokyo'] = np.random.rand(monthly_precip.shape[0]) * 18
monthly_precip['beijing'] = np.random.rand(monthly_precip.shape[0]) * 18

# Convert daily means to monthly totals
print(monthly_precip.multiply(monthly_precip.index.days_in_month, axis=0))

Rolling Window Operations

Smoothing and filtering operatoins use rolling windows:

  • window: size of moving window (typically odd numbers)
  • min_periods: minimum observations required
  • center: center alignment option
  • axis: axis along which to roll
print(temp_data.rolling(window=7).mean())                    # 7-day moving average
print(temp_data.rolling(window=7, center=True).mean())       # Centered window
print(temp_data.rolling(window=7, min_periods=1, center=True).mean())  # With minimal periods

Calculate cumulative metrics over time windows:

# 3-day precipitation totals
print(precip_data.rolling(window=3, center=True, min_periods=1).sum())

Tags: Pandas time-series data-analysis datetime Resampling

Posted on Sun, 28 Jun 2026 17:18:07 +0000 by inni