Linear Regression and Its Regularization Techniques

Regression is a supervised learning method that models the relationship between independant variables (features X) and a dependent variable (target Y). The goal is to learn a function that maps features to a continuous output.

A linear regression model assumes a linear relationship:

h(x) = θ₀ + θ₁x₁ + θ₂x₂ + ... + θₙxₙ = θᵀx

The objective is to find parameters θ that minimize the difference between predicted and actual values across all training samples. This is typically done using the least squares criterion.

Least Squares and Maximum Likelihood

The error term ε is assumed to be independent and identically distributed, following a Gaussian distribution with zero mean and constant variance. This assumption is supported by the central limit theorem.

yⁱ = θᵀxⁱ + εⁱ
p(εⁱ) = (1 / (σ√(2π))) * exp(-εⁱ² / (2σ²))

The likelihood function leads to the minimization of the sum of squared errors:

J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)²

The optimal θ can be solved analytically:

θ = (XᵀX)⁻¹ XᵀY

To ensure invertibility and reduce overfitting, a regularization term can be added:

θ = (XᵀX + λI)⁻¹ XᵀY

Loss Functions

0-1 loss: 1 if prediction is wrong, 0 otherwise.
Perceptron loss: 1 if absolute error exceeds a threshold, 0 otherwise.
Squared loss: Σ (h(xⁱ) - yⁱ)²
Absolute loss: Σ |h(xⁱ) - yⁱ|
Log loss: -Σ yⁱ log h(xⁱ)

Code Example: Simple Linear Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score

mpl.rcParams['font.sans-serif'] = ['simHei']
mpl.rcParams['axes.unicode_minus'] = False

# Construct data
X1 = np.array([10, 15, 20, 30, 50, 60, 60, 70]).reshape((-1, 1))
Y = np.array([0.8, 1.0, 1.8, 2.0, 3.2, 3.0, 3.1, 3.5]).reshape((-1, 1))

# Add intercept term
X = np.column_stack((np.ones_like(X1), X1))
X = np.mat(X)
Y = np.mat(Y)

# Solve theta
theta = (X.T * X).I * X.T * Y
print(theta)

# Predict
predict_y = X * theta
print(predict_y)

predict_y = np.asarray(predict_y)
Y = np.asarray(Y)
print('MSE:', mean_squared_error(Y, predict_y))
print('R²:', r2_score(Y, predict_y))

# Plot
plt.plot(X1, Y, 'bo', label='True')
plt.plot(X1, predict_y, 'r--o', label='Predicted')
plt.legend()
plt.show()

Code Example: Multiple Linear Regression

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.metrics import mean_squared_error, r2_score

# Data with two features
X1 = np.array([
    [10, 1], [15, 1], [20, 1], [30, 1],
    [50, 2], [60, 1], [60, 2], [70, 2]
]).reshape((-1, 2))
Y = np.array([0.8, 1.0, 1.8, 2.0, 3.2, 3.0, 3.1, 3.5]).reshape((-1, 1))

use_bias = True
if use_bias:
    X = np.column_stack((X1, np.ones((X1.shape[0], 1))))
else:
    X = X1

X = np.mat(X)
Y = np.mat(Y)
theta = (X.T * X).I * X.T * Y
print(theta)

predict_y = np.asarray(X * theta)
Y = np.asarray(Y)
print('MSE:', mean_squared_error(Y, predict_y))
print('R²:', r2_score(Y, predict_y))

# Plot 3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], Y, s=40, c='r')

x1_range = np.arange(0, 100)
x2_range = np.arange(0, 4)
x1_grid, x2_grid = np.meshgrid(x1_range, x2_range)

def predict_func(x1, x2, theta, use_bias):
    if use_bias:
        return x1 * theta[0] + x2 * theta[1] + theta[2]
    else:
        return x1 * theta[0] + x2 * theta[1]

z = np.array([predict_func(x1, x2, theta, use_bias) for x1, x2 in zip(x1_grid.flatten(), x2_grid.flatten())])
z = z.reshape(x1_grid.shape)
ax.plot_surface(x1_grid, x2_grid, z, rstride=1, cstride=1, cmap=plt.cm.jet)
ax.set_title('3D Regression')
plt.show()

Custom Linear Regression Class

import numpy as np
from sklearn.metrics import r2_score

class LinearRegressionCustom:
    def __init__(self, fit_intercept=True):
        self.fit_intercept = fit_intercept
        self.coef_ = None
        self.intercept_ = 0

    def fit(self, X, y):
        if self.fit_intercept:
            X = np.column_stack((np.ones((X.shape[0], 1)), X))
        X_mat = np.mat(X)
        y_mat = np.mat(y).reshape((-1, 1))
        theta = (X_mat.T * X_mat).I * X_mat.T * y_mat
        if self.fit_intercept:
            self.intercept_ = theta[0, 0]
            self.coef_ = theta[1:, 0]
        else:
            self.coef_ = theta[:, 0]

    def predict(self, X):
        X_mat = np.mat(X)
        return np.asarray(X_mat * self.coef_ + self.intercept_).flatten()

    def score(self, X, y):
        y_pred = self.predict(X)
        return r2_score(y, y_pred)

Overfitting and Regularization

To prevent overfitting, regularization terms are added to the cost function:

L2 Regularization (Ridge Regression)

J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ Σ θⱼ²

L1 Regularization (LASSO Regression)

J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ Σ |θⱼ|

Comparision:

L2 (Ridge) shrinks coefficients but does not set them to zero, retaining all features.
L1 (LASSO) can drive some coefficients to zero, performing feature selection.
Elastic Net combines both L1 and L2 penalties.

J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ (p Σ θⱼ² + (1-p) Σ |θⱼ|)

Model Evaluation Metrics

MSE: Mean Squared Error, lower is better.
RMSE: Root Mean Squared Error.
R²: Coefficient of determination, best value is 1. Can be negative for poor models.

Hyperparameter Tuning

Cross-validation (e.g., k-fold) is used to find optimal regularziation parameters λ and mixing ratio p.

Code Example: Scikit-learn Implementations

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import PolynomialFeatures

data = pd.read_csv('boston_housing.data', sep='\s+', header=None)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.3, random_state=10)

# Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)
print('Linear train R²:', lr.score(X_train, y_train))
print('Linear test R²:', lr.score(X_test, y_test))

# Ridge Regression
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
print('Ridge train R²:', ridge.score(X_train, y_train))
print('Ridge test R²:', ridge.score(X_test, y_test))

# LASSO Regression
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
print('LASSO train R²:', lasso.score(X_train, y_train))
print('LASSO test R²:', lasso.score(X_test, y_test))

# Polynomial Features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)

lr_poly = LinearRegression()
lr_poly.fit(X_poly_train, y_train)
print('Poly train R²:', lr_poly.score(X_poly_train, y_train))
print('Poly test R²:', lr_poly.score(X_poly_test, y_test))

Summary

Linear regression finds a linear mapping from features to target. Regularization (Ridge, LASSO, Elastic Net) helps control overfitting. Model performance is assessed via MSE, RMSE, and R². Hyperparameters like λ are tuned using cross-validation.

Tags: Linear Regression least squares ridge regression lasso regression elastic net

Posted on Wed, 13 May 2026 06:48:11 +0000 by stpra123

Freaks City