Time Series Creation in Pandas
Pandas offers robust functionality for creating time series data through two primary approaches:
- Using the built-in
date_rangefunction to generate time sequences with specified start/end dates and intervals - Converting existing date strings to
DatetimeIndexobjects using theto_datetimefunction
Creating Time Sequences with date_range
The date_range function accepts various parameters including:
S: secondsT: minutesH: hoursD: daysB: business daysW: weeksM: monthsQ: quartersY: years
Numbers can precede these codes for custom intervals like 2H for two hours.
import pandas as pd
# Generate sequence with default daily frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-01-04')
print(time_sequence)
# Monthly frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-05-07', freq='M')
print(time_sequence)
# Quarterly frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-11-07', freq='Q')
print(time_sequence)
The function provides additional customization options:
# Start of month frequency
time_sequence = pd.date_range(start='2021-01-01', end='2021-04-07', freq='MS')
print(time_sequence)
# Weekly frequency starting Monday
time_sequence = pd.date_range(start='2021-01-01', end='2021-01-15', freq='W-MON')
print(time_sequence)
When only one enpdoint is known:
time_sequence = pd.date_range(start='2021-01-01', freq='D', periods=7)
print(time_sequence)
time_sequence = pd.date_range(end='2021-01-07', freq='W', periods=7)
print(time_sequence)
Additional useful parameters include:
tz: timezone specification (default None)normalize: normalize to midnight (default False)name: assign name to generated sequenceinclusive: control inclusion of endpoints ('left', 'right', 'both', 'neither')
time_sequence = pd.date_range('2021-01-01', '2021-01-05', freq='D', tz='Asia/Tokyo')
print(time_sequence)
# Convert to UTC
print(time_sequence.tz_convert('UTC'))
Converting from Other Formats
Convert string-based dates to datetime objects:
# Load sample climate data
data_frame = pd.DataFrame({
'temp_moscow': [12.5, 13.2, 11.8, 14.1],
'temp_istanbul': [18.3, 19.1, 17.9, 20.2]
}, index=['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'])
# Convert index to datetime
print(data_frame.index)
data_frame.index = pd.to_datetime(data_frame.index)
print(data_frame.index)
Direct conversion examples:
print(pd.to_datetime('2021-01'))
print(pd.to_datetime(['2021-01-01', '2021-01-02']))
For non-standard formats, use the format parameter:
# Format codes: %Y (year), %m (month), %d (day), %H (hour), %M (minute), %S (second)
print(pd.to_datetime('202101', format='%Y%m'))
print(pd.to_datetime(['20211', '20212', '202112'], format='%Y%m'))
Time Series Shifting
Various methods exist for shifting time series:
time_sequence = pd.date_range('2021-01-01', '2021-01-05', freq='D')
print(time_sequence)
print(time_sequence.shift(1)) # Shift by current frequency unit
print(time_sequence.shift(2, freq='M')) # Shift by specified frequency
# Add timedelta offsets
delta_time = pd.to_timedelta(3, unit='D')
print(time_sequence + delta_time)
Time Series Data Processing
Working with temporal data requires special handling due to periodic nature and varying month lengths. Consider this example with synthetic weather data:
import numpy as np
# Create daily temperature dataset
temp_data = pd.DataFrame(index=pd.date_range('2024-01-01', '2024-12-31', freq='D'))
temp_data['berlin'] = np.random.rand(len(temp_data)) * 12 - 10
temp_data['madrid'] = np.random.rand(len(temp_data)) * 12 + 8
temp_data['athens'] = np.random.rand(len(temp_data)) * 12 + 20
temp_data['london'] = np.random.rand(len(temp_data)) * 12 + 5
temp_data['paris'] = np.random.rand(len(temp_data)) * 12 + 3
print(temp_data.head())
Basic statistical operations:
print(temp_data.mean()) # Overall averages
print(temp_data.min()) # Minimum values
print(temp_data.max()) # Maximum values
Resampling provides different temporal aggregations:
print(temp_data.resample('M').mean()) # Monthly averages
print(temp_data.resample('QS-DEC').mean()) # Seasonal averages
For precipitation-like data where totals matter more than averages:
precip_data = temp_data.copy() / 8
precip_data[precip_data < 0] = 0
print(precip_data.resample('Y').sum()) # Annual totals
Handling monthly data with varying day counts:
monthly_precip = pd.DataFrame(index=pd.date_range('2024-01-01', periods=12, freq='M'))
monthly_precip['seoul'] = np.random.rand(monthly_precip.shape[0]) * 18
monthly_precip['tokyo'] = np.random.rand(monthly_precip.shape[0]) * 18
monthly_precip['beijing'] = np.random.rand(monthly_precip.shape[0]) * 18
# Convert daily means to monthly totals
print(monthly_precip.multiply(monthly_precip.index.days_in_month, axis=0))
Rolling Window Operations
Smoothing and filtering operatoins use rolling windows:
window: size of moving window (typically odd numbers)min_periods: minimum observations requiredcenter: center alignment optionaxis: axis along which to roll
print(temp_data.rolling(window=7).mean()) # 7-day moving average
print(temp_data.rolling(window=7, center=True).mean()) # Centered window
print(temp_data.rolling(window=7, min_periods=1, center=True).mean()) # With minimal periods
Calculate cumulative metrics over time windows:
# 3-day precipitation totals
print(precip_data.rolling(window=3, center=True, min_periods=1).sum())