Understanding PyTorch for Deep Learning
PyTorch has emerged as one of the leading frameworks in deep learning, particularly favored in research and academia. Its dynamic computation graph and intuitive design make it a preferred choice for prototyping and experimentation. In contrast to static-graph alternatives, PyTorch enables developers to manipulate computational operations on-the-fly, which greatly simplifies debuggging and model development.
Setting Up the Environment
Before diving into model construction, ensure your environment is properly configured. Using Anaconda streamlines dependency management. Install PyTorch via Conda or Pip depending on your system's CUDA support:
# For systems with CUDA 10.2
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
# CPU-only version
pip install torch torchvision torchaudio
# Using a mirror for faster download (e.g., Tsinghua in China)
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch==1.13.1 torchvision==0.14.1
Constructing a Basic Neural Network
A fundamental neural network involves several key components: data preparation, learnable parameters, model architecture, loss function, and optimizer. Below is a structured implementation:
import torch
import numpy as np
# Simulate training data
def generate_data():
x = np.random.rand(100, 1).astype(np.float32)
y = 2 * x + 1 + 0.1 * np.random.randn(100, 1).astype(np.float32)
return torch.from_numpy(x), torch.from_numpy(y)
# Initialize parameters
def initialize_parameters():
w = torch.randn(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)
return w, b
# Forward pass: linear transformation
def predict(x, weight, bias):
return x @ weight + bias
# Mean squared error loss
def compute_loss(targets, predictions):
return (predictions - targets).pow(2).sum()
# Gradient-based update step
def step(weight, bias, lr):
with torch.no_grad():
weight -= lr * weight.grad
bias -= lr * bias.grad
# Clear gradients after update
weight.grad.zero_()
bias.grad.zero_()
# Training loop
X, Y = generate_data()
W, B = initialize_parameters()
learning_rate = 0.01
for epoch in range(500):
Y_pred = predict(X, W, B)
loss = compute_loss(Y, Y_pred)
loss.backward() # Compute gradients
step(W, B, learning_rate)
if epoch % 50 == 0:
print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
Data Representation in PyTorch
Tensors are the core data structure in PyTorch, analogous to NumPy arrays but with GPU acceleration and automatic differentiation support.
Scalar (0-D Tensor)
scalar = torch.tensor(3.14)
print(scalar.size()) # Output: []
Vector (1-D Tensor)
temperatures = torch.FloatTensor([22.1, 23.5, 25.0, 26.7, 28.3])
print(temperatures.size()) # Output: [5]
Matrix (2-D Tensor)
from sklearn.datasets import load_boston
boston = load_boston()
data = torch.from_numpy(boston.data).float()
print(data.size()) # Output: [506, 13]
Image as 3D Tensor
from PIL import Image
import matplotlib.pyplot as plt
img_pil = Image.open('panda.jpg').resize((224, 224))
img_np = np.array(img_pil)
img_tensor = torch.from_numpy(img_np)
print(img_tensor.size()) # Output: [224, 224, 3]
# Display red channel
plt.imshow(img_tensor[:, :, 0].numpy(), cmap='gray')
plt.show()
Batched Images (4D Tensor)
import glob
file_list = glob.glob('images/cat*.jpg')
images = [np.array(Image.open(f).resize((224, 224))) for f in file_list[:32]]
batch_tensor = torch.stack([torch.from_numpy(im) for im in images])
print(batch_tensor.size()) # Output: [32, 224, 224, 3]
Video Data (5D Tensor)
Video sequences can be represented as tensors of shape [batch, frames, height, width, channels], where multiple video clips are processed simultaneously.
Tensor Slicing Examples
sales = torch.FloatTensor([1000, 950, 870, 1100, 1020, 980, 1050])
print(sales[:4]) # First four elements
print(sales[-3:]) # Last three elements
GPU Acceleration
Moving tensors to GPU significantly speeds up computation:
a = torch.rand(5000, 5000)
b = torch.rand(5000, 5000)
# CPU computation
%timeit a @ b # Example output: ~1.2 s per loop
# Move to GPU
if torch.cuda.is_available():
a = a.cuda()
b = b.cuda()
%timeit torch.matmul(a, b) # Example output: ~8 ms per loop
Automatic Differentiation with Autograd
The autograd module tracks operations for gradient computation:
x = torch.ones(2, 2, requires_grad=True)
y = (x * 3).mean()
y.backward()
print(x.grad) # Gradient: [[0.75, 0.75], [0.75, 0.75]]
Custom Dataset and DataLoader
Use Dataset and DataLoader for efficient data handling:
from torch.utils.data import Dataset, DataLoader
class ImageLabelDataset(Dataset):
def __init__(self, image_paths, labels, transform=None):
self.images = image_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
image = Image.open(self.images[idx]).convert('RGB')
if self.transform:
image = self.transform(image)
label = self.labels[idx]
return image, label
# Usage
dataset = ImageLabelDataset(file_list, label_list)
loader = DataLoader(dataset, batch_size=16, shuffle=True, num_workers=4)
for batch_images, batch_labels in loader:
# Perform forward pass
pass
Built-in Layers and Optimizers
Instead of manual matrix operations, use torch.nn modules:
import torch.nn as nn
import torch.optim as optim
model = nn.Linear(1, 1) # Equivalent to y = wx + b
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)