Variable Assignment and Series Arithmetic
Initial dataframe construction followed by computed column derivation:
# Initialize dataframe with location metadata
location_data = {
'region': ['Alpha', 'Beta', 'Gamma'],
'population': [15000, 24000, 37000],
'area_km2': [120, 95, 140]
}
df_locations = pd.DataFrame(location_data, index=['CityA', 'CityB', 'CityC'])
# Compute running totals using cumulative aggregation
df_locations['total_pop'] = df_locations['population'].cumsum()
Sorting and Group Aggregations
Arranging records by largest area first and calculating grouped metrics:
# Arrange records by largest area first
df_sorted = df_locations.sort_values(by='area_km2', ascending=False)
# Aggregate statistics per geographic region
grouped_count = df_sorted.groupby('region').count()
grouped_sum = df_sorted.groupby('region').sum()
Constructing DataFrames from Dictionaries
Nested dictionaries automatically map outer keys to columns and inner keys to the index:
# Nested dictionaries automatically map outer keys to columns
metric_history = {
'Switzerland': {'2020': 3.2, '2021': 4.1, '2022': 5.8},
'France': {'2020': 4.5, '2021': 5.0, '2022': 6.2},
'Japan': {'2020': 2.9, '2021': 3.5, '2022': 4.7}
}
hist_df = pd.DataFrame(metric_history)
# Transpose orientation and reorder specific countries
transposed_hist = hist_df.T.reindex(['Switzerland', 'Italy', 'France', 'Japan'])
Random Data Population and CSV Persistence
Filling frames with scaled stochastic values and managing file I/O:
# Populate frame with scaled Gaussian noise
personnel = ['Alice', 'Bob', 'Charlie']
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
noise_df = pd.DataFrame(np.random.randn(5, 3) * 1000, columns=personnel, index=months)
# Persistence operations
# Export to flat file
noise_df.to_csv('exported_metrics.csv', index=True)
# Load with explicit index handling
loaded_explicit = pd.read_csv('exported_metrics.csv', index_col=0)
# Load assuming raw data without header row
loaded_raw = pd.read_csv('raw_export.csv', header=None)
Combining DataFrames: Concatenation and Relational Merging
Appending datasets vertically/horizontally or joining based on shared keys:
base_records = pd.DataFrame({
'id': ['X01', 'X02'],
'category': ['tech', 'finance'],
'revenue': [1200, 850]
}, index=[10, 11])
supplementary = pd.DataFrame({
'id': ['Y01', 'Y02'],
'category': ['media', 'logistics'],
'revenue': [920, 1100]
}, index=[10, 11])
# Vertical stacking preserving original indices
combined_v = pd.concat([base_records, supplementary])
# Auto-generate fresh integer index
combined_auto_idx = pd.concat([base_records, supplementary], ignore_index=True)
# Horizontal stacking side-by-side
combined_h = pd.concat([base_records, supplementary], axis=1)
# Relational merge operation
master_table = pd.DataFrame({
'pk': ['K0', 'K1', 'K2'],
'val_A': ['A0', 'A1', 'A2'],
'val_B': ['B0', 'B1', 'B2']
})
lookup_table = pd.DataFrame({
'pk': ['K0', 'K1', 'K2'],
'val_C': ['C0', 'C1', 'C2'],
'val_D': ['D0', 'D1', 'D2']
})
joined_result = pd.merge(master_table, lookup_table, how='left', on='pk')
Practical Workflow: Exploration, Indexing, and Statistical Computations
Generating synthetic regoinal datasets and performing analytical operations:
# Synthetic regional dataset generation
regional_data = {
'Pacific': [6.1, 5.8, 4.9, 4.2, 6.3, 5.4, 4.8, 7.9, 9.4, 8.2, 6.5],
'Atlantic': [4.8, 4.5, 3.7, 4.1, 5.9, 5.8, 5.1, 8.3, 8.9, 7.6, 5.4],
'Central': [5.5, 5.3, 4.3, 4.1, 5.9, 5.4, 4.5, 7.7, 9.2, 7.8, 5.6],
'Baseline': [5.3, 5.1, 4.1, 4.0, 5.6, 5.2, 4.4, 7.5, 8.8, 7.9, 5.5]
}
time_points = list(range(1995, 2017, 2))
stats_df = pd.DataFrame(regional_data, index=time_points)
# Inspect record boundaries
stats_df.head()
stats_df.tail()
# Visualize temporal trends across regions
stats_df.plot()
# Targeted label-based extraction
single_val = stats_df.loc[1995, 'Pacific']
multi_selection = stats_df.loc[[1995, 2005], ['Pacific', 'Baseline']]
full_column_slice = stats_df['Atlantic']
# Numerical transformations and matrix operations
normalized_rates = stats_df['Pacific'] / 100
peak_value = stats_df['Central'].max()
regional_difference = stats_df['Pacific'] - stats_df['Atlantic']
pairwise_correlation = stats_df['Pacific'].corr(stats_df['Atlantic'])
full_correlation_matrix = stats_df.corr()