VGG16 Theory
Advantages of VGG16
VGG16, proposed by Simonyan and Zisserman, introduced several key innovations:
- Small Convolutional Kernels: It primarily uses 3x3 convolutional kernels instead of larger ones like 7x7. This approach offers two main benefits:
- It reduces the number of parameters in the model.
- It increases the model's non-linearity by stacking multiple 3x3 layers, making the decision function more discriminative.
Network Architecture
The VGG16 architecture consists of a series of convolutional and pooling layers, followed by fully connected layers. The progression of feature maps through the network is as follows:
- Input: 224x224x3 (RGB image)
- After Block 1: 224x224x64 -> 112x112x64 (after MaxPool)
- After Block 2: 112x112x128 -> 56x56x128 (after MaxPool)
- After Block 3: 56x56x256 -> 56x56x256 -> 28x28x256 (after MaxPool)
- After Block 4: 28x28x512 -> 28x28x512 -> 14x14x512 (after MaxPool)
- After Block 5: 14x14x512 -> 14x14x512 -> 14x14x512 -> 7x7x512 (after MaxPool)
Following the convolutional blocks, the network has three fully connected layers:
- FC1: Input 7x7x512 (25088), Output 4096
- FC2: Input 4096, Output 4096
- FC3: Input 4096, Output 1000 (for ImageNet classification)
Data Preprocessing
Before training, image data requires preprocessing. This involves resizing, cropping, and normalization. The following Python class demonstrates these transformations for training and validation datasets:
import torchvision.transforms as transforms
class ImageTransform:
def __init__(self, resize, mean, std):
self.data_transform = {
'train': transforms.Compose([
transforms.RandomResizedCrop(resize, scale=(0.5, 1.0)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean, std)
]),
'val': transforms.Compose([
transforms.Resize(resize),
transforms.CenterCrop(resize),
transforms.ToTensor(),
transforms.Normalize(mean, std)
])
}
def __call__(self, img, phase='train'):
return self.data_transform[phase](img)
VGG16 PyTorch Implementation
The following code defines a VGG16 model in PyTorch. Note that this implementation is adapted for a binary classification task (e.g., bees vs. ants) with 2 output classes instead of the original 1000.
import torch
import torch.nn as nn
import torch.nn.functional as F
class VGG16(nn.Module):
def __init__(self, num_classes=2):
super(VGG16, self).__init__()
# Convolutional layers
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
nn.Conv2d(256, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2, padding=1)
)
# Fully connected layers
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096, num_classes)
)
def forward(self, x):
x = self.features(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
# Example usage
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = VGG16(num_classes=2).to(device)
# Create a dummy input tensor
dummy_input = torch.randn(4, 3, 224, 224).to(device)
output = model(dummy_input)
print(output.shape) # Should print: torch.Size([4, 2])