Regression is a supervised learning method that models the relationship between independant variables (features X) and a dependent variable (target Y). The goal is to learn a function that maps features to a continuous output.
A linear regression model assumes a linear relationship:
h(x) = θ₀ + θ₁x₁ + θ₂x₂ + ... + θₙxₙ = θᵀx
The objective is to find parameters θ that minimize the difference between predicted and actual values across all training samples. This is typically done using the least squares criterion.
Least Squares and Maximum Likelihood
The error term ε is assumed to be independent and identically distributed, following a Gaussian distribution with zero mean and constant variance. This assumption is supported by the central limit theorem.
yⁱ = θᵀxⁱ + εⁱ
p(εⁱ) = (1 / (σ√(2π))) * exp(-εⁱ² / (2σ²))
The likelihood function leads to the minimization of the sum of squared errors:
J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)²
The optimal θ can be solved analytically:
θ = (XᵀX)⁻¹ XᵀY
To ensure invertibility and reduce overfitting, a regularization term can be added:
θ = (XᵀX + λI)⁻¹ XᵀY
Loss Functions
- 0-1 loss: 1 if prediction is wrong, 0 otherwise.
- Perceptron loss: 1 if absolute error exceeds a threshold, 0 otherwise.
- Squared loss: Σ (h(xⁱ) - yⁱ)²
- Absolute loss: Σ |h(xⁱ) - yⁱ|
- Log loss: -Σ yⁱ log h(xⁱ)
Code Example: Simple Linear Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
mpl.rcParams['font.sans-serif'] = ['simHei']
mpl.rcParams['axes.unicode_minus'] = False
# Construct data
X1 = np.array([10, 15, 20, 30, 50, 60, 60, 70]).reshape((-1, 1))
Y = np.array([0.8, 1.0, 1.8, 2.0, 3.2, 3.0, 3.1, 3.5]).reshape((-1, 1))
# Add intercept term
X = np.column_stack((np.ones_like(X1), X1))
X = np.mat(X)
Y = np.mat(Y)
# Solve theta
theta = (X.T * X).I * X.T * Y
print(theta)
# Predict
predict_y = X * theta
print(predict_y)
predict_y = np.asarray(predict_y)
Y = np.asarray(Y)
print('MSE:', mean_squared_error(Y, predict_y))
print('R²:', r2_score(Y, predict_y))
# Plot
plt.plot(X1, Y, 'bo', label='True')
plt.plot(X1, predict_y, 'r--o', label='Predicted')
plt.legend()
plt.show()
Code Example: Multiple Linear Regression
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.metrics import mean_squared_error, r2_score
# Data with two features
X1 = np.array([
[10, 1], [15, 1], [20, 1], [30, 1],
[50, 2], [60, 1], [60, 2], [70, 2]
]).reshape((-1, 2))
Y = np.array([0.8, 1.0, 1.8, 2.0, 3.2, 3.0, 3.1, 3.5]).reshape((-1, 1))
use_bias = True
if use_bias:
X = np.column_stack((X1, np.ones((X1.shape[0], 1))))
else:
X = X1
X = np.mat(X)
Y = np.mat(Y)
theta = (X.T * X).I * X.T * Y
print(theta)
predict_y = np.asarray(X * theta)
Y = np.asarray(Y)
print('MSE:', mean_squared_error(Y, predict_y))
print('R²:', r2_score(Y, predict_y))
# Plot 3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], Y, s=40, c='r')
x1_range = np.arange(0, 100)
x2_range = np.arange(0, 4)
x1_grid, x2_grid = np.meshgrid(x1_range, x2_range)
def predict_func(x1, x2, theta, use_bias):
if use_bias:
return x1 * theta[0] + x2 * theta[1] + theta[2]
else:
return x1 * theta[0] + x2 * theta[1]
z = np.array([predict_func(x1, x2, theta, use_bias) for x1, x2 in zip(x1_grid.flatten(), x2_grid.flatten())])
z = z.reshape(x1_grid.shape)
ax.plot_surface(x1_grid, x2_grid, z, rstride=1, cstride=1, cmap=plt.cm.jet)
ax.set_title('3D Regression')
plt.show()
Custom Linear Regression Class
import numpy as np
from sklearn.metrics import r2_score
class LinearRegressionCustom:
def __init__(self, fit_intercept=True):
self.fit_intercept = fit_intercept
self.coef_ = None
self.intercept_ = 0
def fit(self, X, y):
if self.fit_intercept:
X = np.column_stack((np.ones((X.shape[0], 1)), X))
X_mat = np.mat(X)
y_mat = np.mat(y).reshape((-1, 1))
theta = (X_mat.T * X_mat).I * X_mat.T * y_mat
if self.fit_intercept:
self.intercept_ = theta[0, 0]
self.coef_ = theta[1:, 0]
else:
self.coef_ = theta[:, 0]
def predict(self, X):
X_mat = np.mat(X)
return np.asarray(X_mat * self.coef_ + self.intercept_).flatten()
def score(self, X, y):
y_pred = self.predict(X)
return r2_score(y, y_pred)
Overfitting and Regularization
To prevent overfitting, regularization terms are added to the cost function:
L2 Regularization (Ridge Regression)
J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ Σ θⱼ²
L1 Regularization (LASSO Regression)
J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ Σ |θⱼ|
Comparision:
- L2 (Ridge) shrinks coefficients but does not set them to zero, retaining all features.
- L1 (LASSO) can drive some coefficients to zero, performing feature selection.
- Elastic Net combines both L1 and L2 penalties.
J(θ) = (1/2) Σ (h(xⁱ) - yⁱ)² + λ (p Σ θⱼ² + (1-p) Σ |θⱼ|)
Model Evaluation Metrics
- MSE: Mean Squared Error, lower is better.
- RMSE: Root Mean Squared Error.
- R²: Coefficient of determination, best value is 1. Can be negative for poor models.
Hyperparameter Tuning
Cross-validation (e.g., k-fold) is used to find optimal regularziation parameters λ and mixing ratio p.
Code Example: Scikit-learn Implementations
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import PolynomialFeatures
data = pd.read_csv('boston_housing.data', sep='\s+', header=None)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.3, random_state=10)
# Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)
print('Linear train R²:', lr.score(X_train, y_train))
print('Linear test R²:', lr.score(X_test, y_test))
# Ridge Regression
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
print('Ridge train R²:', ridge.score(X_train, y_train))
print('Ridge test R²:', ridge.score(X_test, y_test))
# LASSO Regression
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
print('LASSO train R²:', lasso.score(X_train, y_train))
print('LASSO test R²:', lasso.score(X_test, y_test))
# Polynomial Features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)
lr_poly = LinearRegression()
lr_poly.fit(X_poly_train, y_train)
print('Poly train R²:', lr_poly.score(X_poly_train, y_train))
print('Poly test R²:', lr_poly.score(X_poly_test, y_test))
Summary
Linear regression finds a linear mapping from features to target. Regularization (Ridge, LASSO, Elastic Net) helps control overfitting. Model performance is assessed via MSE, RMSE, and R². Hyperparameters like λ are tuned using cross-validation.