Data Visualization with Python - Working with matplotlib and Pygal

Chapter 15: Generating and Visualizing Data

This chapter explores how to use matplotlib and Pygal to generate data and create practical visualizations. We'll cover the fundamentals of data visualization, which involves exploring data through visual representations, and data mining, which uses code to examine patterns and relationships within datasets.

15.1 Introduction to Visualization Libraries

matplotlib is a mathematical plotting library that enables creation of simple charts such as line charts and scatter plots. It provides extensive formatting options for customizing the appearance of visualizations.

Plotly focuses on generating charts optimized for digital devices. Charts created with Plotly automatically adjust to fit different screen sizes and include interactive features that highlight specific data points when users hover over different chart regions.

15.2 Creating Line Charts and Scatter Plots

To begin creating visualizations, import the pyplot module from matplotlib, commonly aliased as plt:

import matplotlib.pyplot as plt

Create a figure with a single subplot using the subplots() function:

fig, ax = plt.subplots(figsize=(15, 9), dpi=128)

This returns the figure object (fig) representing the entire chart and the axes object (ax) for plotting. The dpi parameter controls resolution, while figsize specifies dimensions in inches as a tuple.

Drawing Line Charts

Use plot() to create line charts:

ax.plot(x_data, y_data, color='green', marker='o', linestyle='-', 
        label='Data Series', linewidth=2, alpha=0.8)

Parameters include:

  • First argument: x-coordinates (iterable, defaults to 0, 1, 2... if omitted)
  • Second argument: y-coordinates (required)
  • Third argument: style string combining color, marker, and line style
  • label: legend text (requires calling legend())
  • linewidth: line thickness
  • alpha: transparency (0 to 1)

Color options include: g (green), b (blue), r (red), c (cyan), m (magenta). Custom colors can use RGB tuples like (0.02, 0.31, 0.62).

Marker styles: . (point), o (circle), ^ (triangle), v (inverted triangle), * (star), + (plus).

Line styles: - (solid), -- (dashed), -. (dash-dot), : (dotted).

Creating Scatter Plots

The scatter() function creates scatter plots:

ax.scatter(x_values, y_values, edgecolor='black', c='blue', s=50)

Parameters include x and y coordinates (both required), edgecolor for outline color (use 'none' to remove outlines), c for point color (accepts color names or RGB tuples), and s for point size.

Use the cmap parameter to apply color maps that gradient from start to end colors based on values:

ax.scatter(x_values, y_values, c=y_values, cmap=plt.cm.Blues, s=40)

Chart Formatting

Add titles and axis labels:

plt.title('Chart Title', fontsize=24)
plt.xlabel('X Axis Label', fontsize=14)
plt.ylabel('Y Axis Label', fontsize=14)

Configure tick parameters:

plt.tick_params(axis='both', labelsize=14, which='major')

The axis parameter accepts 'x', 'y', or 'both'. The which parameter specifies 'major', 'minor', or 'both' tick marks.

Rotate x-axis labels to prevent overlapping:

fig.autofmt_xdate()

Display the legend:

ax.legend()
# Or set labels programmatically
line, = ax.plot([1, 2, 3])
line.set_label('My Label')
ax.legend()

Set axis ranges:

plt.axis([xmin, xmax, ymin, ymax])

Hide axes:

ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

Adjust figure size and resolution:

plt.figure(dpi=128, figsize=(10, 6))

Save the chart to a file:

plt.savefig('chart.png', bbox_inches='tight')

The bbox_inches='tight' parameter trims excess whitespace.

Complete Example: Line Chart

import matplotlib.pyplot as plt

x_data = [1, 2, 3, 4, 5]
y_data = [1, 4, 9, 16, 25]
plt.plot(x_data, y_data, linewidth=5, label='Squared Values')
plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
plt.legend()
plt.tick_params(axis='both', labelsize=14)
plt.show()

Complete Example: Scatter Plot

import matplotlib.pyplot as plt

x_values = list(range(1, 1001))
y_values = [x**2 for x in x_values]
plt.scatter(x_values, y_values, edgecolor='none', c=(0, 0, 0.8), s=40, label='square')
plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
plt.legend()
plt.axis([0, 1100, 0, 1100000])
plt.tick_params(axis='both', labelsize=14)
plt.show()

Using Built-in Styles

matplotlib provides various built-in styles that configure background colors, grid lines, line widths, fonts, and sizes:

import matplotlib.pyplot as plt
print(plt.style.available)  # List available styles

plt.style.use('seaborn')  # Apply a specific style
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [1, 4, 9], linewidth=3)
plt.show()

15.3 Random Walks

A random walk is a path determined by successive random decisions, with no clear direction. This concept is useful for simulating various natural phenomena.

Create a class to generate random walk data:

from random import choice

class RandomWalk:
    """A class to generate random walk data points."""
    
    def __init__(self, num_points=5000):
        """Initialize random walk attributes."""
        self.num_points = num_points
        self.x_values = [0]
        self.y_values = [0]
    
    def fill_walk(self):
        """Calculate all points in the random walk."""
        while len(self.x_values) < self.num_points:
            x_direction = choice([1, -1])
            x_distance = choice([0, 1, 2, 3, 4])
            x_step = x_direction * x_distance
            
            y_direction = choice([1, -1])
            y_distance = choice([0, 1, 2, 3, 4])
            y_step = y_direction * x_distance
            
            if x_step == 0 and y_step == 0:
                continue
            
            next_x = self.x_values[-1] + x_step
            next_y = self.y_values[-1] + y_step
            
            self.x_values.append(next_x)
            self.y_values.append(next_y)

Visualize the random walk:

import matplotlib.pyplot as plt
from random_walk import RandomWalk

rw = RandomWalk()
rw.fill_walk()
plt.scatter(rw.x_values, rw.y_values, s=5)
plt.show()

Simulate multiple random walks with color mapping:

import matplotlib.pyplot as plt
from random_walk import RandomWalk

while True:
    rw = RandomWalk()
    rw.fill_walk()
    
    point_numbers = list(range(rw.num_points))
    plt.scatter(rw.x_values, rw.y_values, s=5, c=point_numbers, 
                cmap=plt.cm.Blues, edgecolors='none')
    
    # Highlight start and end points
    plt.scatter(rw.x_values[0], rw.y_values[0], c='green', edgecolors='none', s=50)
    plt.scatter(rw.x_values[-1], rw.y_values[-1], c='red', edgecolors='none', s=50)
    
    plt.axes().get_xaxis().set_visible(False)
    plt.axes().get_yaxis().set_visible(False)
    plt.figure(figsize=(10, 6))
    
    plt.show()
    
    keep_running = input("Generate another walk? (y/n): ")
    if keep_running == 'n':
        break

15.4 Simulating Dice Rolls with Pygal

Pygal creates interactive, scalable vector graphics ideal for charts that must display across different screen sizes. Charts render as SVG files that can be opened in web browsers with interactive features.

A histogram is a bar chart showing the frequency of different outcomes.

Creating Bar Charts with Pygal

Pygal offers several bar chart types:

import pygal
basic_chart = pygal.Bar()           # Standard bar chart
stacked_chart = pygal.StackedBar()  # Stacked bar chart
horizontal_chart = pygal.HorizontalBar()  # Horizontal bar chart

Customize chart colors using style classes:

import pygal
from pygal.style import LightenStyle as LS, LightColorizedStyle as LCS

# Single color base
my_style = LS('#333366')
chart = pygal.Bar(style=my_style)

# Lightened color scheme
my_style = LS('#333366', base_style=LCS)
chart = pygal.Bar(style=my_style)

Configure chart appearance with parameters or a Config object:

import pygal

# Direct parameters
chart = pygal.Bar(x_label_rotation=45, show_legend=False)

# Using Config for multiple settings
my_config = pygal.Config()
my_config.x_label_rotation = 45
my_config.show_legend = False
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.truncate_label = 15
my_config.show_y_guides = False
my_config.width = 1000
chart = pygal.Bar(my_config)

Set chart properties and add data:

chart.title = "Chart Title"
chart.x_labels = ['Label1', 'Label2', 'Label3']
chart.x_title = "X Axis Title"
chart.y_title = "Y Axis Title"
chart.add('Data Series', [value1, value2, value3])
chart.render_to_file('chart.svg')

Dice Simulation

Create a Die class:

from random import randint

class Die:
    """Represents a single die."""
    
    def __init__(self, num_sides=6):
        """Default to 6-sided die."""
        self.num_sides = num_sides
    
    def roll(self):
        """Return random value between 1 and number of sides."""
        return randint(1, self.num_sides)

Simulate rolling a six-sided die 1000 times:

import pygal
from die import Die

die = Die()
results = []
for roll_num in range(1000):
    result = die.roll()
    results.append(result)

frequencies = []
for value in range(2, die.num_sides + 1):
    frequency = results.count(value)
    frequencies.append(frequency)

hist = pygal.Bar()
hist.title = "Rolling 1000 Times with D6"
hist.x_labels = ['1', '2', '3', '4', '5', '6']
hist.x_title = "Result"
hist.y_title = "Frequency"
hist.add('D6', frequencies)
hist.render_to_file('die_visual.svg')

Simulate rolling two different dice:

import pygal
from die import Die

die_1 = Die()
die_2 = Die(10)  # 10-sided die

results = []
for roll_num in range(50000):
    result = die_1.roll() + die_2.roll()
    results.append(result)

frequencies = []
for value in range(2, die_1.num_sides + die_2.num_sides + 1):
    frequency = results.count(value)
    frequencies.append(frequency)

hist = pygal.Bar()
hist.title = "Rolling D6 and D10 50000 Times"
hist.x_labels = [str(i) for i in range(2, 17)]
hist.x_title = "Result"
hist.y_title = "Frequency"
hist.add('D6 + D10', frequencies)
hist.render_to_file('dice_combination.svg')

Chapter 16: Downloading Data

Data is commonly stored in CSV (comma-separated values) and JSON formats. This chapter covers processing weather data from CSV files and population data from JSON files.

16.1 Working with CSV Files

CSV files contain data as comma-separated values. Use Python's built-in csv module to process these files.

Read and parse CSV files:

import csv

filename = 'weather_data.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    for index, column_header in enumerate(header_row):
        print(index, column_header)

Parse dates using datetime:

from datetime import datetime
date_obj = datetime.strptime('2014-7-1', '%Y-%m-%d')

Format codes: %Y (4-digit year), %y (2-digit year), %m (month 01-12), %d (day 01-31), %A (weekday), %B (month name), %H (24-hour), %I (12-hour), %M (minutes), %S (seconds), %p (am/pm).

Fill area between lines:

plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)

Complete example parsing weather data:

import csv
from datetime import datetime
import matplotlib.pyplot as plt

filename = 'sitka_weather_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    
    dates, highs, lows = [], [], []
    for row in reader:
        current_date = datetime.strptime(row[0], "%Y-%m-%d")
        dates.append(current_date)
        highs.append(int(row[1]))
        lows.append(int(row[3]))
    
    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(dates, highs, c='red')
    plt.plot(dates, lows, c='blue')
    plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)
    
    plt.title("Daily high and low temperatures - 2014", fontsize=24)
    plt.xlabel('', fontsize=16)
    fig.autofmt_xdate()
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)
    
    plt.show()

Eror Handling in Data Processing

Handle missing or invalid data using try-except blocks:

import csv
from datetime import datetime
import matplotlib.pyplot as plt

filename = 'death_valley_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    
    dates, highs, lows = [], [], []
    for row in reader:
        try:
            current_date = datetime.strptime(row[0], "%Y-%m-%d")
            high = int(row[1])
            low = int(row[3])
        except ValueError:
            print(current_date, 'missing data')
        else:
            dates.append(current_date)
            highs.append(high)
            lows.append(low)
    
    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(dates, highs, c='red')
    plt.plot(dates, lows, c='blue')
    plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)
    
    plt.title("Daily high and low temperatures - 2014\nDeath Valley, CA", fontsize=24)
    plt.xlabel('', fontsize=16)
    fig.autofmt_xdate()
    plt.ylabel("Temperature (F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)
    
    plt.show()

16.2 Creating World Population Maps with JSON

JSON (JavaScript Object Notation) is a common format for storing structured data. This section covers processing population data and creating world maps using Pygal.

Processing JSON Data

Load JSON data:

import json

filename = 'population_data.json'
with open(filename) as f:
    pop_data = json.load(f)

for pop_dict in pop_data:
    if pop_dict['Year'] == '2010':
        country_name = pop_dict['Country Name']
        population = int(float(pop_dict['Value']))
        print(f"{country_name}: {population}")

Get country codes for Pygal maps:

from pygal_maps_world.i18n import COUNTRIES

def get_country_code(country_name):
    """Return the two-letter country code for a given country name."""
    for code, name in COUNTRIES.items():
        if name == country_name:
            return code
    return None

Create world maps with Pygal:

from pygal_maps_world.maps import World
from pygal.style import RotateStyle, LightColorizedStyle

wm_style = RotateStyle('#336699', base_style=LightColorizedStyle)
wm = World(style=wm_style)
wm.title = "World Population"
wm.add('North America', {'ca': 34126000, 'us': 309349000, 'mx': 113423000})
wm.render_to_file('world_map.svg')

Complete population map example:

import json
from pygal_maps_world.i18n import COUNTRIES
from pygal_maps_world.maps import World
from pygal.style import RotateStyle, LightColorizedStyle

def get_country_code(country_name):
    for code, name in COUNTRIES.items():
        if name == country_name:
            return code
    return None

filename = 'population_data.json'
with open(filename) as f:
    pop_data = json.load(f)

cc_populations = {}
for pop_dict in pop_data:
    if pop_dict['Year'] == '2010':
        country_name = pop_dict['Country Name']
        population = int(float(pop_dict['Value']))
        country_code = get_country_code(country_name)
        if country_code:
            cc_populations[country_code] = population

cc_pops_1, cc_pops_2, cc_pops_3 = {}, {}, {}
for cc, pop in cc_populations.items():
    if pop < 10_000_000:
        cc_pops_1[cc] = pop
    elif pop < 1_000_000_000:
        cc_pops_2[cc] = pop
    else:
        cc_pops_3[cc] = pop

wm_style = RotateStyle('#336699', base_style=LightColorizedStyle)
wm = World(style=wm_style)
wm.title = 'World Population 2010'
wm.add('0-10M', cc_pops_1)
wm.add('10M-1B', cc_pops_2)
wm.add('1B+', cc_pops_3)
wm.render_to_file('world_population.svg')

Chapter 17: Using Web APIs

Web APIs (Application Programming Interfaces) allow programs to request specific information from websites rather than retrieving entire pages. This enables creating applications that always use the most current data.

17.1 Making API Requests

APIs return data in easily processed formats like JSON or CSV. GitHub's API provides access to repository information.

Example API call to GitHub:

https://api.github.com/search/repositories?q=language:python&sort=stars

This returns Python repositories sorted by star count.

Make API requests using the requests library:

import requests

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
headers = {'Accept': 'application/vnd.github.v3+json'}
response = requests.get(url, headers=headers)
print(f"Status code: {response.status_code}")

response_dict = response.json()
print(response_dict.keys())

The response contains total_count, incomplete_results, and items (list of repositories).

Process repository data:

repo_dicts = response_dict['items']
print(f"Total repositories: {response_dict['total_count']}")
print(f"Returned items: {len(repo_dicts)}")

for repo_dict in repo_dicts:
    print(f"\nName: {repo_dict['name']}")
    print(f"Owner: {repo_dict['owner']['login']}")
    print(f"Stars: {repo_dict['stargazers_count']}")
    print(f"URL: {repo_dict['html_url']}")
    print(f"Description: {repo_dict['description']}")

API Rate Limits

Most APIs impose rate limits restricting requests per time period. Check GitHub's rate limits:

https://api.github.com/rate_limit

The response shows limits for different API categories. The search API typically allows 10 requests per minute.

17.2 Visualizing API Data with Pygal

Create interactive visualizations from API data:

import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
headers = {'Accept': 'application/vnd.github.v3+json'}
response = requests.get(url, headers=headers)

response_dict = response.json()
repo_dicts = response_dict['items']

names, stars = [], []
for repo_dict in repo_dicts:
    names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])

my_style = LS('#333366', base_style=LCS)
chart = pygal.Bar(style=my_style, x_label_rotation=45, show_legend=False)
chart.title = 'Most Popular Python Projects on GitHub'
chart.x_labels = names
chart.add('', stars)
chart.render_to_file('python_repos.svg')

Use Config object for detailed customization:

my_config = pygal.Config()
my_config.x_label_rotation = 45
my_config.show_legend = False
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.truncate_label = 15
my_config.show_y_guides = False
my_config.width = 1000
chart = pygal.Bar(my_config, style=my_style)

Adding Custom Tooltips

Create custom tooltips showing additional information:

names, plot_dicts = [], []
for repo_dict in repo_dicts:
    names.append(repo_dict['name'])
    plot_dict = {
        'value': repo_dict['stargazers_count'],
        'label': str(repo_dict['description']),
    }
    plot_dicts.append(plot_dict)

chart.add('', plot_dicts)

The label key provides the tooltip text. Converting to string prevents errors with None values.

Adding Clickable Links

Make bars clickable by adding xlink:

plot_dict = {
    'value': repo_dict['stargazers_count'],
    'label': str(repo_dict['description']),
    'xlink': repo_dict['html_url'],
}

Clicking any bar opens the project's GitHub page in a new browser tab.

Additional Resources

For more information on matplotlib and Pygal, consult the official documentation and styling guides. These tools provide extensive customization options for creating professional visualizations.

Tags: matplotlib pygal python data-visualization charts

Posted on Thu, 21 May 2026 18:32:22 +0000 by railgun