Linear Regression and Its Regularization Techniques

Regression is a supervised learning method that models the relationship between independant variables (features X) and a dependent variable (target Y). The goal is to learn a function that maps features to a continuous output.

A linear regression model assumes a linear relationship:

h(x) = θ₀ + θ₁x₁ + θ₂x₂ + ... + θₙxₙ = θᵀx

The objective is to find parameters θ that minimize the difference between predicted and actual values across all training samples. This is typically done using the least squares criterion.

Least Squares and Maximum Likelihood

The error term ε is assumed to be independent and identically distributed, following a Gaussian distribution with zero mean and constant variance. This assumption is supported by the central limit theorem.

yⁱ = θᵀxⁱ + εⁱ
p(εⁱ) = (1 / (σ√(2π))) * exp(-εⁱ² / (2σ²))

The likelihood function leads to the minimization of the sum of squared errors:

J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)²

The optimal θ can be solved analytically:

θ = (XᵀX)⁻¹ XᵀY

To ensure invertibility and reduce overfitting, a regularization term can be added:

θ = (XᵀX + λI)⁻¹ XᵀY

Loss Functions

  • 0-1 loss: 1 if prediction is wrong, 0 otherwise.
  • Perceptron loss: 1 if absolute error exceeds a threshold, 0 otherwise.
  • Squared loss: Σ (h(xⁱ) - yⁱ)²
  • Absolute loss: Σ |h(xⁱ) - yⁱ|
  • Log loss: -Σ yⁱ log h(xⁱ)

Code Example: Simple Linear Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score

mpl.rcParams['font.sans-serif'] = ['simHei']
mpl.rcParams['axes.unicode_minus'] = False

# Construct data
X1 = np.array([10, 15, 20, 30, 50, 60, 60, 70]).reshape((-1, 1))
Y = np.array([0.8, 1.0, 1.8, 2.0, 3.2, 3.0, 3.1, 3.5]).reshape((-1, 1))

# Add intercept term
X = np.column_stack((np.ones_like(X1), X1))
X = np.mat(X)
Y = np.mat(Y)

# Solve theta
theta = (X.T * X).I * X.T * Y
print(theta)

# Predict
predict_y = X * theta
print(predict_y)

predict_y = np.asarray(predict_y)
Y = np.asarray(Y)
print('MSE:', mean_squared_error(Y, predict_y))
print('R²:', r2_score(Y, predict_y))

# Plot
plt.plot(X1, Y, 'bo', label='True')
plt.plot(X1, predict_y, 'r--o', label='Predicted')
plt.legend()
plt.show()

Code Example: Multiple Linear Regression

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.metrics import mean_squared_error, r2_score

# Data with two features
X1 = np.array([
    [10, 1], [15, 1], [20, 1], [30, 1],
    [50, 2], [60, 1], [60, 2], [70, 2]
]).reshape((-1, 2))
Y = np.array([0.8, 1.0, 1.8, 2.0, 3.2, 3.0, 3.1, 3.5]).reshape((-1, 1))

use_bias = True
if use_bias:
    X = np.column_stack((X1, np.ones((X1.shape[0], 1))))
else:
    X = X1

X = np.mat(X)
Y = np.mat(Y)
theta = (X.T * X).I * X.T * Y
print(theta)

predict_y = np.asarray(X * theta)
Y = np.asarray(Y)
print('MSE:', mean_squared_error(Y, predict_y))
print('R²:', r2_score(Y, predict_y))

# Plot 3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], Y, s=40, c='r')

x1_range = np.arange(0, 100)
x2_range = np.arange(0, 4)
x1_grid, x2_grid = np.meshgrid(x1_range, x2_range)

def predict_func(x1, x2, theta, use_bias):
    if use_bias:
        return x1 * theta[0] + x2 * theta[1] + theta[2]
    else:
        return x1 * theta[0] + x2 * theta[1]

z = np.array([predict_func(x1, x2, theta, use_bias) for x1, x2 in zip(x1_grid.flatten(), x2_grid.flatten())])
z = z.reshape(x1_grid.shape)
ax.plot_surface(x1_grid, x2_grid, z, rstride=1, cstride=1, cmap=plt.cm.jet)
ax.set_title('3D Regression')
plt.show()

Custom Linear Regression Class

import numpy as np
from sklearn.metrics import r2_score

class LinearRegressionCustom:
    def __init__(self, fit_intercept=True):
        self.fit_intercept = fit_intercept
        self.coef_ = None
        self.intercept_ = 0

    def fit(self, X, y):
        if self.fit_intercept:
            X = np.column_stack((np.ones((X.shape[0], 1)), X))
        X_mat = np.mat(X)
        y_mat = np.mat(y).reshape((-1, 1))
        theta = (X_mat.T * X_mat).I * X_mat.T * y_mat
        if self.fit_intercept:
            self.intercept_ = theta[0, 0]
            self.coef_ = theta[1:, 0]
        else:
            self.coef_ = theta[:, 0]

    def predict(self, X):
        X_mat = np.mat(X)
        return np.asarray(X_mat * self.coef_ + self.intercept_).flatten()

    def score(self, X, y):
        y_pred = self.predict(X)
        return r2_score(y, y_pred)

Overfitting and Regularization

To prevent overfitting, regularization terms are added to the cost function:

L2 Regularization (Ridge Regression)

J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ Σ θⱼ²

L1 Regularization (LASSO Regression)

J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ Σ |θⱼ|

Comparision:

  • L2 (Ridge) shrinks coefficients but does not set them to zero, retaining all features.
  • L1 (LASSO) can drive some coefficients to zero, performing feature selection.
  • Elastic Net combines both L1 and L2 penalties.
J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ (p Σ θⱼ² + (1-p) Σ |θⱼ|)

Model Evaluation Metrics

  • MSE: Mean Squared Error, lower is better.
  • RMSE: Root Mean Squared Error.
  • : Coefficient of determination, best value is 1. Can be negative for poor models.

Hyperparameter Tuning

Cross-validation (e.g., k-fold) is used to find optimal regularziation parameters λ and mixing ratio p.

Code Example: Scikit-learn Implementations

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import PolynomialFeatures

data = pd.read_csv('boston_housing.data', sep='\s+', header=None)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.3, random_state=10)

# Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)
print('Linear train R²:', lr.score(X_train, y_train))
print('Linear test R²:', lr.score(X_test, y_test))

# Ridge Regression
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
print('Ridge train R²:', ridge.score(X_train, y_train))
print('Ridge test R²:', ridge.score(X_test, y_test))

# LASSO Regression
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
print('LASSO train R²:', lasso.score(X_train, y_train))
print('LASSO test R²:', lasso.score(X_test, y_test))

# Polynomial Features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)

lr_poly = LinearRegression()
lr_poly.fit(X_poly_train, y_train)
print('Poly train R²:', lr_poly.score(X_poly_train, y_train))
print('Poly test R²:', lr_poly.score(X_poly_test, y_test))

Summary

Linear regression finds a linear mapping from features to target. Regularization (Ridge, LASSO, Elastic Net) helps control overfitting. Model performance is assessed via MSE, RMSE, and R². Hyperparameters like λ are tuned using cross-validation.

Tags: Linear Regression least squares ridge regression lasso regression elastic net

Posted on Wed, 13 May 2026 06:48:11 +0000 by stpra123