VGG16: A Deep Convolutional Neural Network for Image Recognition

VGG16 Theory

Advantages of VGG16

VGG16, proposed by Simonyan and Zisserman, introduced several key innovations:

Small Convolutional Kernels: It primarily uses 3x3 convolutional kernels instead of larger ones like 7x7. This approach offers two main benefits:
- It reduces the number of parameters in the model.
- It increases the model's non-linearity by stacking multiple 3x3 layers, making the decision function more discriminative.

Network Architecture

The VGG16 architecture consists of a series of convolutional and pooling layers, followed by fully connected layers. The progression of feature maps through the network is as follows:

Input: 224x224x3 (RGB image)
After Block 1: 224x224x64 -> 112x112x64 (after MaxPool)
After Block 2: 112x112x128 -> 56x56x128 (after MaxPool)
After Block 3: 56x56x256 -> 56x56x256 -> 28x28x256 (after MaxPool)
After Block 4: 28x28x512 -> 28x28x512 -> 14x14x512 (after MaxPool)
After Block 5: 14x14x512 -> 14x14x512 -> 14x14x512 -> 7x7x512 (after MaxPool)

Following the convolutional blocks, the network has three fully connected layers:

FC1: Input 7x7x512 (25088), Output 4096
FC2: Input 4096, Output 4096
FC3: Input 4096, Output 1000 (for ImageNet classification)

Data Preprocessing

Before training, image data requires preprocessing. This involves resizing, cropping, and normalization. The following Python class demonstrates these transformations for training and validation datasets:

import torchvision.transforms as transforms

class ImageTransform:
    def __init__(self, resize, mean, std):
        self.data_transform = {
            'train': transforms.Compose([
                transforms.RandomResizedCrop(resize, scale=(0.5, 1.0)),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                transforms.Normalize(mean, std)
            ]),
            'val': transforms.Compose([
                transforms.Resize(resize),
                transforms.CenterCrop(resize),
                transforms.ToTensor(),
                transforms.Normalize(mean, std)
            ])
        }

    def __call__(self, img, phase='train'):
        return self.data_transform[phase](img)

VGG16 PyTorch Implementation

The following code defines a VGG16 model in PyTorch. Note that this implementation is adapted for a binary classification task (e.g., bees vs. ants) with 2 output classes instead of the original 1000.

import torch
import torch.nn as nn
import torch.nn.functional as F

class VGG16(nn.Module):
    def __init__(self, num_classes=2):
        super(VGG16, self).__init__()
        # Convolutional layers
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),

            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),

            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),

            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),

            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1)
        )

        # Fully connected layers
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

# Example usage
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = VGG16(num_classes=2).to(device)

# Create a dummy input tensor
dummy_input = torch.randn(4, 3, 224, 224).to(device)
output = model(dummy_input)
print(output.shape)  # Should print: torch.Size([4, 2])

Tags: VGG16 pytorch convolutional neural networks Image Recognition Deep Learning

Posted on Thu, 02 Jul 2026 16:10:49 +0000 by nick1

Freaks City