Python Data Visualization with Pandas and Matplotlib

Effective data visualization is essential for exploratory data analysis and communicating insights. This article covers common visualization methods using pandas and matplotlib.

Plot Types with pandas DataFrame

The pandas DataFrame provides built-in plotting methods that wrap matplotlib functionality. These methods accept a kind parameter to specify the chart type.

Available Chart Types

  • area: Area plot for showing cumulative totals
  • pie: Pie chart for proportional representation
  • scatter: Scatter plot requiring column-based indexing
  • hexbin: Hexbin plot for two-dimensional histogram with hexagonal bins

Line Charts with Subplots

The folllowing example demonstrates creating multiple subplots in a grid layout:

import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
data = pd.read_csv('dataset.csv')

# Create 2x2 grid of line charts
data.plot(
    kind='line',
    subplots=True,
    layout=(2, 2),
    sharex=False,
    sharey=False
)

plt.show()

Parameters explained:

  • subplots=True: Generates seperate subplots for each series
  • layout=(2, 2): Arranges subplots in a 2-row by 2-column grid
  • sharex and sharey: Control axis sharing. Options include False, 'none', 'all', 'row', or 'col'

Correlation Matrix Heatmap

Visualizing correlations between variables helps identify relationships in datasets:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load sample data
file_path = 'C:/data/student_scores.csv'
columns = ['chinese', 'math', 'english']
df = pd.read_csv(file_path, names=columns, sep=',')

# Calculate correlation matrix
correlation_matrix = df.corr()

# Create heatmap visualization
fig = plt.figure()
ax = fig.add_subplot(111)

# Display matrix with color mapping
im = ax.matshow(correlation_matrix, vmin=-1, vmax=1)
fig.colorbar(im)

# Configure axis ticks
tick_positions = np.arange(0, 3, 1)
ax.set_xticks(tick_positions)
ax.set_yticks(tick_positions)
ax.set_xticklabels(columns)
ax.set_yticklabels(columns)

plt.show()

Key functions:

  • corr(): Computes pairwise correlation coefficients between columns. Values range from -1 (inverse relationship) to 1 (direct relationship)
  • matshow(): Displays a matrix as an image with color-coded values
  • vmin and vmax: Define the color scale boundaries for the colormap
  • colorbar(): Adds a color scale legend to the figure

Configuring ticks:

  • set_xticks(): Positions tick marks along the x-axis
  • set_yticks(): Positions tick marks along the y-axis
  • set_xticklabels(): Assigns custom labels to x-axis ticks
  • set_yticklabels(): Assigns custom labels to y-axis ticks

Using np.arange()

np.arange(start, stop, step)
  • start: Beginning value (default 0)
  • stop: End value (excluded from output)
  • step: Increment between values (default 1)

Scattter Matrix Plot

The scatter matrix provides pairwise relationships between all numeric columns:

from pandas.plotting import scatter_matrix

scatter_matrix(
    df,
    diagonal='kde',
    alpha=0.5,
    figsize=(10, 10)
)

plt.show()

This creates a grid showing histograms on the diagonal and scatter plots for each variable pair, allowing quick identification of correlations and patterns across multiple dimensions.

Tags: python Data Visualization Pandas matplotlib Data Analysis

Posted on Sat, 09 May 2026 20:40:05 +0000 by K3nnnn