Effective data visualization is essential for exploratory data analysis and communicating insights. This article covers common visualization methods using pandas and matplotlib.
Plot Types with pandas DataFrame
The pandas DataFrame provides built-in plotting methods that wrap matplotlib functionality. These methods accept a kind parameter to specify the chart type.
Available Chart Types
area: Area plot for showing cumulative totalspie: Pie chart for proportional representationscatter: Scatter plot requiring column-based indexinghexbin: Hexbin plot for two-dimensional histogram with hexagonal bins
Line Charts with Subplots
The folllowing example demonstrates creating multiple subplots in a grid layout:
import pandas as pd
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('dataset.csv')
# Create 2x2 grid of line charts
data.plot(
kind='line',
subplots=True,
layout=(2, 2),
sharex=False,
sharey=False
)
plt.show()
Parameters explained:
subplots=True: Generates seperate subplots for each serieslayout=(2, 2): Arranges subplots in a 2-row by 2-column gridsharexandsharey: Control axis sharing. Options includeFalse,'none','all','row', or'col'
Correlation Matrix Heatmap
Visualizing correlations between variables helps identify relationships in datasets:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load sample data
file_path = 'C:/data/student_scores.csv'
columns = ['chinese', 'math', 'english']
df = pd.read_csv(file_path, names=columns, sep=',')
# Calculate correlation matrix
correlation_matrix = df.corr()
# Create heatmap visualization
fig = plt.figure()
ax = fig.add_subplot(111)
# Display matrix with color mapping
im = ax.matshow(correlation_matrix, vmin=-1, vmax=1)
fig.colorbar(im)
# Configure axis ticks
tick_positions = np.arange(0, 3, 1)
ax.set_xticks(tick_positions)
ax.set_yticks(tick_positions)
ax.set_xticklabels(columns)
ax.set_yticklabels(columns)
plt.show()
Key functions:
corr(): Computes pairwise correlation coefficients between columns. Values range from -1 (inverse relationship) to 1 (direct relationship)matshow(): Displays a matrix as an image with color-coded valuesvminandvmax: Define the color scale boundaries for the colormapcolorbar(): Adds a color scale legend to the figure
Configuring ticks:
set_xticks(): Positions tick marks along the x-axisset_yticks(): Positions tick marks along the y-axisset_xticklabels(): Assigns custom labels to x-axis ticksset_yticklabels(): Assigns custom labels to y-axis ticks
Using np.arange()
np.arange(start, stop, step)
start: Beginning value (default 0)stop: End value (excluded from output)step: Increment between values (default 1)
Scattter Matrix Plot
The scatter matrix provides pairwise relationships between all numeric columns:
from pandas.plotting import scatter_matrix
scatter_matrix(
df,
diagonal='kde',
alpha=0.5,
figsize=(10, 10)
)
plt.show()
This creates a grid showing histograms on the diagonal and scatter plots for each variable pair, allowing quick identification of correlations and patterns across multiple dimensions.