Implementing Linear Regression with Gradient Descent Variants

Gradient descent is widely adopted in modern machine learning inference due to its efficiency with large-scale datasets and high-dimensional feature spaces. Unlike closed-form solutions that become computationally prohibitive as data volume grows, gradient descent updates parameters iteratively using gradient computations on subsets or the entirety of data.

Batch Gradient Descent

import numpy as np

# np.random.seed(1)
features = np.random.rand(100, 1)
targets = 4 + 3 * features + np.random.randn(100, 1)
design_matrix = np.c_[np.ones((100, 1)), features]

step_size = 0.001
iterations = 10000
weights = np.random.randn(2, 1)

for _ in range(iterations):
    error = design_matrix.dot(weights) - targets
    grad = design_matrix.T.dot(error)
    weights = weights - step_size * grad

print(weights)

Library Import

numpy is used for numerical operations and matrix computations.

Data Generation

features = np.random.rand(100, 1) creates 100 random values following a uniform distribution. targets = 4 + 3 * features + np.random.randn(100, 1) generates corresponding target values with linear relationship plus random noise.

Design Matrix Construction

design_matrix = np.c_[np.ones((100, 1)), features] appends a column of ones to the feature matrix to account for the intercept term in the linear model.

Hyperparameter Setup

step_size = 0.001 controls the magnitude of parameter updates. iterations = 10000 defines the total number of update cycles.

Weight Initialization

weights = np.random.randn(2, 1) initializes the model parameters (intercept and slope) with random values.

Gradient Descant Loop

The loop runs for the specified number of iterations. error = design_matrix.dot(weights) - targets computes the difference between predictions and true values. grad = design_matrix.T.dot(error) calculates the gradient via matrix multiplication. weights = weights - step_size * grad updates the parameters in the direction opposite to the gradient.

Mini-Batch Gradient Descent

import numpy as np

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), X]

step_size = 0.001
epochs = 1000
sample_count = 100
batch_size = 10
batch_num = sample_count // batch_size
weights = np.random.randn(2, 1)

for epoch in range(epochs):
    for _ in range(batch_num):
        idx = np.random.randint(sample_count)
        x_batch = X_b[idx: idx + batch_size]
        y_batch = y[idx: idx + batch_size]
        grad = x_batch.T.dot(x_batch.dot(weights) - y_batch)
        weights = weights - step_size * grad

print(weights)

Hyperparameter Differences

batch_size = 10 specifies the number of samples per mini-batch. batch_num = sample_count // batch_size calculates how many batches the dataset is split into.

Iterative Training

The outer loop runs for the total number of epochs (full passes over the dataset). The inner loop processes each mini-batch. A random starting index is selected, and a subset of data is extracted to compute gradients and update weights. This approach uses only a portion of data per update, making it more scalable for large datasets.

Shuffled Batch Optimization

for epoch in range(epochs):
    indices = np.arange(len(X_b))
    np.random.shuffle(indices)
    X_b = X_b[indices]
    y = y[indices]
    for i in range(batch_num):
        x_batch = X_b[i * batch_size: (i + 1) * batch_size]
        y_batch = y[i * batch_size: (i + 1) * batch_size]
        grad = x_batch.T.dot(x_batch.dot(weights) - y_batch)
        weights = weights - step_size * grad

Shuffling the dataset before splitting into batches introduces randomness and reduces the chance of repeatedly missing certain data samples, improving training stability.

Adaptive Learning Rate

import numpy as np

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
X_b = np.c_[np.ones((100, 1)), X]

a, b = 5, 500

def get_learning_rate(t):
    return a / (t + b)

step_size = 0.001
epochs = 1000
sample_count = 100
batch_size = 10
batch_num = sample_count // batch_size
weights = np.random.randn(2, 1)

for epoch in range(epochs):
    indices = np.arange(len(X_b))
    np.random.shuffle(indices)
    X_b = X_b[indices]
    y = y[indices]
    for i in range(batch_num):
        x_batch = X_b[i * batch_size: (i + 1) * batch_size]
        y_batch = y[i * batch_size: (i + 1) * batch_size]
        grad = x_batch.T.dot(x_batch.dot(weights) - y_batch)
        step_size = get_learning_rate(epoch * sample_count + i)
        weights = weights - step_size * grad

print(weights)

The get_learning_rate function reduces the step size as training progresses. Larger steps are used early to speed up convergence, while smaller steps in later stages help fine-tune the parameters and avoid overshooting the optimal solution.

Tags: Linear Regression Gradient Descent Machine Learning Numpy Mini-Batch Gradient Descent

Posted on Sat, 09 May 2026 22:02:58 +0000 by mattkirkey

Freaks City