Surface Crack Detection with CNN on Kaggle

1. Dataset Acquisition

Concrete surface cracks are a primary defect in civil structures. Building inspection is performed to evaluate stiffness and tensile strangth. Crack detection plays a vital role in building health assessment.

The dataset contains images of various concrete surfaces with and without cracks. The image data is divided into two classes: negative (no crack) and positive (crack present), stored in separate folders. Each class contains 20,000 images, totaling 40,000 RGB images of size 227×227 pixels. The dataset is derived from 458 high-resolution images (4032×3024 pixels) using the method proposed by [Zhang et al 2016]. The high-resolution images exhibit high variability in surface treatment and lighting conditions. No data augmentation such as random rotation, flipping, or tilting was applied.

The dataset used in this article is a public dataset on Kaggle, available at: Surface Crack Detection Dataset

Download the file named archive.zip, extract it, and you will get the positive and negative folders containing the sample data. Each folder contains 20,000 images, totaling 40,000 images.

[Image: dataset folder structure]

[Image: sample images from both classes]

2. Environment Setup

Create a virtual environment:

python -m venv mykaggle

Install PyTorch and torchvision:

pip install torch -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple/

Directory structure:

├─archive
│  ├─Negative
│  └─Positive
└─mykaggle
│─main.ipynb

3. Data Loading

The core class for data loading in PyTorch is torch.utils.data.DataLoader, which is an iterable object. Our goal is to load the data; the core code is as follows:

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
           batch_sampler=None, num_workers=0, collate_fn=None,
           pin_memory=False, drop_last=False, timeout=0,
           worker_init_fn=None, *, prefetch_factor=2,
           persistent_workers=False)

3.1 Define Image Transformations

Using the torchvision.transforms module:

from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.6953, 0.6752, 0.6424], std=[0.0941, 0.0914, 0.0880])
])

transforms.Compose([...]): This class combines multiple transform operations into a pipeline. It accepts a list of operations and applies them sequentially.
transforms.Resize((224, 224)): Resizes the image to the specified size (224x224 pixels), ensuring all input images have the same dimensions. If the original image is not square, this operation maintains the aspect ratio and pads the remaining area.
transforms.ToTensor(): Converts a PIL Image or NumPy ndarray to a PyTorch tensor. It scales the pixel values from the range [0, 255] to [0.0, 1.0]. For RGB images, it returns a tensor of shape (3, H, W), where 3 represents the color channels (red, green, blue).
transforms.Normalize(mean=[0.6953, 0.6752, 0.6424], std=[0.0941, 0.0914, 0.0880]): This operation standardizes each channel of the image. The mean and std parameters are the per-channel (RGB) means and standard deviations. The normalization formula is: output = (input - mean) / std. The values used here were computed specifically for this dataset. You can use these directly or compute them yourself; code for computing is provided below. Normalization helps make the numerical range consistent across different images, facilitating neural network training and convergence.

import torch
from torchvision import transforms, datasets
from torch.utils.data import DataLoader

def calculate_dataset_stats(dataset_path, batch_size=64, num_workers=4):
    # Use only the ToTensor transform, no normalization
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor()
    ])

    # Load the dataset
    dataset = datasets.ImageFolder(dataset_path, transform=transform)
    dataloader = DataLoader(dataset, batch_size=batch_size, num_workers=num_workers, shuffle=False)

    # Initialize variables
    total_mean = torch.zeros(3)
    total_var = torch.zeros(3)
    total_images = 0

    # Compute mean and variance
    for images, _ in dataloader:
        batch_samples = images.size(0)
        images = images.view(batch_samples, images.size(1), -1)
        total_mean += images.mean(2).sum(0)
        total_var += images.var(2).sum(0)
        total_images += batch_samples

    # Compute final mean and standard deviation
    mean = total_mean / total_images
    std = torch.sqrt(total_var / total_images)

    return mean, std

# Usage
dataset_path = './archive'
mean, std = calculate_dataset_stats(dataset_path)

print(f"Computed mean: {mean}")
print(f"Computed std: {std}")

3.2 Load Data and Add Labels

Use torchvision.datasets.ImageFolder to directly load the dataset, which automatically assigns labels based on the folder names:

from torchvision.datasets import ImageFolder

data_dir = './archive'
dataset = ImageFolder(data_dir, transform=transform)

3.3 Split the Dataset

Split the dataset into training and test sets with an 80:20 ratio using torch.utils.data.random_split():

from torch.utils.data import random_split

total_size = len(dataset)
train_size = int(0.8 * total_size)
test_size = total_size - train_size

train_dataset, test_dataset = random_split(dataset, [train_size, test_size])
print(f"Training set size: {len(train_dataset)}")
print(f"Validation set size: {len(test_dataset)}")
# Print class information
print("Class mapping:", dataset.class_to_idx)

Output:

Dataset total size: 40000
Training set size: 32000
Validation set size: 8000
Class mapping: {'Negative': 0, 'Positive': 1}

3.4 Create DataLoaders

Create DataLoaders for training and test sets:

batch_size = 128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

When iterating over the data, for example with train_loader:

for images, labels in train_loader:
    print(f'Image shape: {images.shape}')
    print('Label shape:', labels.shape)
    break

Output showing dataset information:

Image shape: torch.Size([128, 3, 224, 224])
Label shape torch.Size([128])

4. Network Construction

4.1 Network Architecture

Build a CNN with 3 convolutional layers, 3 pooling layers, and 2 fully connected layers.

Since calculating the image dimensions after convolution and pooling can be tricky, you can use an online convolution/pooling formula calculator: http://www.sqflash.com/cal.html

[Image: convolution calculation example]

[Image: pooling calculation example]

import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)  # Input channels 3, output channels 16, kernel 3x3, padding=1, output 224x224x16
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 28 * 28, 512)
        self.fc2 = nn.Linear(512, 2)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))     # 224x224x3  -> (conv1)224x224x16 -> (pool1)112x112x16
        x = self.pool(self.relu(self.conv2(x)))     # 112x112x16 -> (conv2)112x112x32 -> (pool2) 56x56x32
        x = self.pool(self.relu(self.conv3(x)))     # 56x56x32  -> (conv3) 56x56x64  -> (pool3) 28x28x64
        x = x.view(-1, 64 * 28 * 28)                # 28x28x64  -> (view)  50176
        x = self.relu(self.fc1(x))                  #   50176   -> (fc1)    512
        x = self.fc2(x)                             #    512    -> (fc2)     2
        return x

Network Architecture Explanation:

Conv1: Convolution with a 3x3 kernel, output channels 16, padding 1 (preserves spatial dimensions). Output size: 224x224x16.
Pool1: 2x2 max pooling, halves the spatial dimensions. Output size: 112x112x16.
Conv2: Convolution with a 3x3 kernel, output channels 32, padding 1. Output size: 112x112x32.
Pool2: 2x2 max pooling. Output size: 56x56x32.
Conv3: Convolution with a 3x3 kernel, output channels 64, padding 1. Output size: 56x56x64.
Pool3: 2x2 max pooling. Output size: 28x28x64.
view: Flatten operation, resulting in a 1D vector of size 50176.
fc1: 50176 -> 512.
fc2: 512 -> 2 (binary classification).

4.2 Loss Function and Optimizer

Define the necessary loss function and optimizer:

import torch.optim as optim

# Check for available GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

model = CNN().to(device)
# Define cross-entropy loss function and optimizer
criterion = nn.CrossEntropyLoss()
# Define optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

My computer does not have a GPU, so the output is:

Using device: cpu

4.3 Model Training

num_epochs = 10
train_loss = []
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    epoch_loss = running_loss / len(train_loader)
    train_loss.append(epoch_loss)
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}')

4.4 Early Stopping

You can add an early stopping mechanism (not used in this article):

import numpy as np

class EarlyStopping:
    def __init__(self, patience=7, verbose=False, delta=0):
        self.patience = patience
        self.verbose = verbose
        self.delta = delta
        self.best_score = None
        self.early_stop = False
        self.counter = 0
        self.best_loss = np.Inf

    def __call__(self, val_loss, model):
        score = -val_loss
        if self.best_score is None:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
        elif score < self.best_score + self.delta:
            self.counter += 1
            if self.verbose:
                print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
            self.counter = 0

    def save_checkpoint(self, val_loss, model):
        if self.verbose:
            print(f'Validation loss decreased ({self.best_loss:.6f} --> {val_loss:.6f}).  Saving model ...')
        torch.save(model.state_dict(), 'checkpoint.pt')
        self.best_loss = val_loss

4.5 Evaluation

from torchmetrics.classification import BinaryAccuracy, BinaryPrecision, BinaryRecall, BinaryF1Score

def evaluate_model(model, test_loader, device):
    model.to(device)
    model.eval()

    accuracy_metric = BinaryAccuracy().to(device)
    precision_metric = BinaryPrecision().to(device)
    recall_metric = BinaryRecall().to(device)
    f1_score_metric = BinaryF1Score().to(device)

    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            preds = torch.argmax(outputs, dim=1)

            accuracy_metric.update(preds, labels)
            precision_metric.update(preds, labels)
            recall_metric.update(preds, labels)
            f1_score_metric.update(preds, labels)

    accuracy = accuracy_metric.compute()
    precision = precision_metric.compute()
    recall = recall_metric.compute()
    f1_score = f1_score_metric.compute()

    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1_score:.4f}')

evaluate_model(model, test_loader, device)

Since my computer is slow (CPU), I stopped training after 1 epoch. The evaluation results are surprisingly good:

Accuracy: 0.9942
Precision: 0.9987
Recall: 0.9897
F1 Score: 0.9942

5. Strange Error Messages

A problem that stumped me for hours:

"Numpy" is not available

Strange, because numpy is definitely installed. Upon investigation, it turned out to be a compatibility issue with the newest version.

Since numpy==2.0 was released on 2024.6.21, it includes many major version updates and changes. Changes to the Application Binary Interface (ABI) can indeed affect third-party libraries and applications that depend on NumPy, including PyTorch. If PyTorch relies on a specific NumPy ABI, and the NumPy ABI changes, PyTorch may require corresponding updates and recompilation to ensure compatibility with the new version of NumPy.

The error message:

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

How to solve it?

Downgrade NumPy to version 1.23:

pip install numpy==1.23

Possible subsequent error:

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.

Solution: Install NumPy using a pre-built wheel:

python -m pip install --upgrade pip setuptools wheel
pip install --only-binary=numpy numpy==1.23

Tags: CNN pytorch image classification Crack Detection Kaggle

Posted on Thu, 11 Jun 2026 17:53:30 +0000 by Jedi Legend

Freaks City