Batch Traversal and Clipping of GeoTIFF Files in Python

File Discovery via Pathlib

To locate all GeoTIFF datasets scattered across nested directories, the pathlib module provides an efficient recursive search mechanism. By leveraging the rglob method, we can aggregate paths for every .tif file contained within the root directory and its descendants.

from pathlib import Path
from osgeo import gdal
import numpy as np

SOURCE_DIR = Path('/data/source_imagery')
DEST_DIR = Path('/data/cropped_output')
MAX_PIXELS = 7680
BAND_COUNT = 3

# Discover all TIF files recursively
target_images = list(SOURCE_DIR.rglob('*.tif'))

Raster Extraction and Subsetting

Clipping a raster subset manually requires reading the pixel matrix from the source dataset, initializing a new GeoTIFF, and transferring the spatial metadata alongside the clipped array. This operation is executed per file to extract a square region up to the defined maximum dimension.

Dataset Access and Array Reading

Open the source dataset to extract spatial properties—such as the affine transformation matrix and coordinate reference system—along with the band data type. Calculate the target dimensions by constraining the original size to the maximum clipping threshold, then pull the upper-left rectangular subset using ReadAsArray.

def process_geotiff(src_filepath):
    src_ds = gdal.Open(str(src_filepath))
    if not src_ds:
        return

    affine_matrix = src_ds.GetGeoTransform()
    crs_info = src_ds.GetProjection()
    x_pixels = src_ds.RasterXSize
    y_pixels = src_ds.RasterYSize
    band_dtype = src_ds.GetRasterBand(1).DataType

    # Determine crop boundaries
    crop_x = min(x_pixels, MAX_PIXELS)
    crop_y = min(y_pixels, MAX_PIXELS)

    # Extract pixel block from origin (0,0)
    pixel_block = src_ds.ReadAsArray(0, 0, crop_x, crop_y)

Destination Dataset Generation

Instantiate the output GeoTIFF using the GTiff driver, ensuring the created file matches the clipped width, height, band count, and original data type. It is critical to apply the unchanged affine matrix and projection to preserve geographic alignment.

    # Initialize destination raster
    dest_filename = src_filepath.name
    dest_filepath = DEST_DIR / dest_filename
    dest_ds = gdal.GetDriverByName('GTiff').Create(
        str(dest_filepath), crop_x, crop_y, BAND_COUNT, band_dtype
    )

    # Preserve spatial referencing
    dest_ds.SetGeoTransform(affine_matrix)
    dest_ds.SetProjection(crs_info)

Band Data Transfer

Iterate through the designated number of bands to write the sliced array data into the newly generated dataset. Dimensions of the injected array must precisely correspond to the destination file's resolution to avoid runtime exceptions. Concluding the flush process and nullifying the dataset variable ensures data is written to disk.

    # Commit band data
    for band_idx in range(1, BAND_COUNT + 1):
        dest_ds.GetRasterBand(band_idx).WriteArray(pixel_block[band_idx - 1, :, :])

    dest_ds.FlushCache()
    dest_ds = None

# Execute batch cropping
for img_path in target_images:
    process_geotiff(img_path)

Tags: python gdal GeoTIFF Raster Processing Pathlib

Posted on Wed, 20 May 2026 04:45:00 +0000 by ericdorman