Aggregating Census Population Data from Excel Using Python

Processing a 2010 US Census spreadsheet containing thousands of rows—where each row represents a single census tract—requires calculating the total population and tract count for every county. Performing this manually across thousands of entries is inefficient and time-consuming. A Python script can automate these calculations in seconds by reading the Excel data, aggregating the statistics per county, and exporting the results.

The automation workflow involves:

  • Loading the Excel file using the openpyxl library.
  • Iterating through each row to extract state, county, and population figures.
  • Accumulating tract counts and population totals into a nested dictionary.
  • Writing the aggregated dictionary to a Python module file using pprint for easy reuse.
import openpyxl
import pprint

def process_census_data(filepath):
    print(f'Loading workbook: {filepath}')
    workbook = openpyxl.load_workbook(filepath)
    active_sheet = workbook.active

    population_stats = {}

    for row in active_sheet.iter_rows(min_row=2, values_only=True):
        state_name = row[1]   # Column B
        county_name = row[2]  # Column C
        tract_pop = row[3]    # Column D

        # Initialize state and county entries if they don't exist
        population_stats.setdefault(state_name, {})
        population_stats[state_name].setdefault(county_name, {'total_pop': 0, 'tract_count': 0})

        # Increment the tract count and add to the county population
        population_stats[state_name][county_name]['tract_count'] += 1
        population_stats[state_name][county_name]['total_pop'] += int(tract_pop)

    print('Writing aggregated data to module...')
    with open('census_aggregation.py', 'w') as output_file:
        output_file.write('aggregated_data = ' + pprint.pformat(population_stats))

    return population_stats

process_census_data('censuspopdata.xlsx')

Exporting the data structure as a formatted Python dictionary into a .py file using pprint.pformat creates a persistent module. This approach avoids reading the Excel file again in future computations. The generated file can be imported directly into other scripts:

import census_aggregation

print(census_aggregation.aggregated_data['AK'])

Tags: python openpyxl Data Aggregation Census Data Excel Processing

Posted on Thu, 14 May 2026 00:35:33 +0000 by rfrid