Excel Structure Basics
An Excel file with the .xlsx extension represents a workbook. Within a workbook, you can have multiple worksheets, and the currently active worksheet is accessed via the active property. Each workshete contains rows (numbered 1, 2, 3...) and columns (lettered A, B, C...). The intersection of a specific row and column forms a cell.
Reading Workflow
The process of extracting data from Excel files follows this pattern:
- Import the
openpyxlmodule - Load the file using
load_workbook('filename.xlsx')to get a workbook object - Access the worksheet via
wb.activeorwb.get_sheet_by_name('SheetName') - Retrieve cell data using direct indexing or the
cell()method - Leverage cell properties like
value,row,column, andcoordinate - Use dimension properties
max_rowandmax_columnto determine data boundaries - Convert between column letters and numbers using utility functions
Workbook → Worksheet → Rows/Columns → Cells
Working with Worksheets
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
print(wb.sheetnames)
new_ws = wb.create_sheet('Report')
print(wb.sheetnames)
target = wb.get_sheet_by_name('Report')
alternate = wb['Report']
for ws in wb:
print(ws.title)
Accessing Cell Data
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
print(ws['A1'])
print(ws['A1'].value)
c = ws['B1']
print(f'Row {c.row}, Column {c.column}: {c.value}')
print(f'Coordinate {c.coordinate}: {c.value}')
Using the cell() Method
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
print(ws.cell(row=1, column=2))
print(ws.cell(row=1, column=2).value)
for i in range(1, 8, 2):
print(i, ws.cell(row=i, column=2).value)
Retrieving Rows and Columns
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
column_c = ws['C']
print(column_c)
row_five = ws[5]
print(row_five, type(row_five))
Iterating Through Data
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
col_range = ws['B:C']
row_range = ws[2:6]
for col in col_range:
for cell in col:
print(cell.value)
for row in row_range:
for cell in row:
print(cell.value)
for row in ws.iter_rows(min_row=1, max_row=2, max_col=2):
for cell in row:
print(cell.value)
Determining Dimensions
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
print(f'Total rows: {ws.max_row}, Total columns: {ws.max_column}')
Converting Between Letter and Number Formats
import openpyxl
from openpyxl.utils import get_column_letter, column_index_from_string
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
print(get_column_letter(2))
print(get_column_letter(47))
print(get_column_letter(900))
print(column_index_from_string('AAH'))