Feature Selection and Dimensionality Reduction in Machine Learning

Data and features define the upper bound of machine learning performance; models and algorithms merely approach this limit.

  1. Feature Selection

Feature selection aims to identify the most relevant subset of input variables to improve model interpretability, reduce overfitting, and enhance computational efficiency—especially critical for high-dimensional datasets.

1.1 Traditional Feature Selection Methods

Wrapper Methods

These methods evaluate feature subsets based on the predictive performance of a specific learning algorithm. While effective, they are computationally expensive due to the exponential search space (O(2d) for d features) and prone to overfitting.

  • Sequential Forward Selection (SFS): Starts with an empty set and iteratively adds the feature that maximizes a scoring function J.
  • Sequential Backward Selection (SBS): Begins with all features and removes the least useful one at each step.
  • Sequential Floating Forward Selection (SFFS): Extends SFS with conditional backward steps to correct greeedy choices.
from mlxtend.feature_selection import SequentialFeatureSelector
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()
sfs = SequentialFeatureSelector(
    knn,
    k_features=3,
    forward=True,
    floating=False,
    scoring='accuracy',
    cv=4,
    n_jobs=-1
)
sfs.fit(X, y)
print("Selected feature indices:", sfs.k_feature_idx_)
print("Cross-validated score:", sfs.k_score_)

Filter Methods

These rank features using statistical measures independant of any learning algorithm, making them fast and scalable.

  • Variance Threshold: Removes low-variance features.
  • Correlation Coefficients (Pearson/Spearman): Measure linear/monotonic relationships with the target.
  • Mutual Information: Quantifies dependency via information gain: I(X;Y) = H(Y) - H(Y|X).
  • Chi-Square Test: Assesses independence between categorical features and labels.
  • ANOVA F-test: Evaluates differences in means across classes.
from sklearn.feature_selection import VarianceThreshold

selector = VarianceThreshold(threshold=0.16)  # Keeps features with var > 0.16
X_reduced = selector.fit_transform(X)

Embedded Methods

Integrate feature selection into the model training process (e.g., Lasso regression with L1 regularization, decision trees with impurity-based importance).

1.2 Advanced Feature Selection Techniques

Laplacian Score (Unsupervised)

This method selects features that preserve local manifold structure by minimizing the Laplacian score:

  1. Construct a k-NN graph.
  2. Compute affinity matrix S using heat kernel: S(i,j) = exp(-||x_i - x_j||² / t).
  3. Derive degree matrix D and Laplacian L = D - S.
  4. Select features with smallest Lr.

Fisher Score (Supervised)

Maximizes between-class scatter while minimizing within-class scatter:

Unlike feature selection (which discards features), dimensionality reduction transforms features into a lower-dimensional space via projection.

Principle Component Analysis (PCA)

PCA identifies orthogonal directions (principal components) of maximum variance in the data.

Algorithm Steps:

  1. Center the data: X ← X - mean(X).
  2. Compute covariance matrix: C = XᵀX.
  3. Eigendecompose C and select top k eigenvectors.
  4. Project data: Y = XW, where W contains selected eigenvectors.
from sklearn.decomposition import PCA

pca = PCA(n_components=3)
X_pca = pca.fit_transform(X)
print("Explained variance ratio:", pca.explained_variance_ratio_)

Tags: Feature-Selection dimensionality-reduction PCA scikit-learn mutual-information

Posted on Wed, 10 Jun 2026 18:26:23 +0000 by VagabondKites