Filter-Based Feature Selection Techniques in Machine Learning

Filter-based feature selection evaluates features prior to model training using statistical metrics or dependency measures between features and the target variable. It ranks features by relevance and selects a subset expected to improve generalization and reduce overfitting.

Workflow

Data Acquisition — Gather a dataset containing feature columns and a target variable. Preprocess to handle missing values and outliers.
Feature Scoring — Apply a chosen metric (variance, mutual information, chi-square, Pearson correlation, information gain) to quantify each feature's association with the target.
Ranking — Sort features by their scores in descending order of relevance.
Thresholding (Optional) — Define a cutoff, such as keeping only the top N features or those exceeding a score threshold.
Subset Construction — Retain the qualifying features for downstream modeling.

Advantages include independence from learning algorithms, low computational cost, ease of interpretation, and suitability for high-dimensional data.

Variance Thresholding

Removes features with low variance, assuming they carry little discriminative information.

Procedure

Prepare cleaned data.
Compute variance for each feature.
Choose a variance limit.
Discard features below the limit.

Effective for continuous variables; discrete ones may require encoding.

Example

import numpy as np
from sklearn.feature_selection import VarianceThreshold

matrix = np.array([[0, 2, 0, 3],
                   [0, 1, 4, 3],
                   [0, 1, 1, 3]])

vt = VarianceThreshold(threshold=0.6)
filtered = vt.fit_transform(matrix)

print("Initial matrix:\n", matrix)
print("Filtered matrix:\n", filtered)
print("Kept column indices:", vt.get_support(indices=True))
print("Variances:", vt.variances_)

Mutual Information

Measures shared information between a feature and target, capturing linear and nonlinear dependencies.

Procedure

Clean and prepare data.
Calculate mutual information for each feature-target pair.
Rank features by mutual information score.
Select top-ranked features.

Works for both continuous and categorical data (latter may need discretization).

Example

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, mutual_info_classif

data = load_iris()
features, labels = data.data, data.target

mi_selector = SelectKBest(mutual_info_classif, k=2)
reduced_features = mi_selector.fit_transform(features, labels)

print("Original shape:", features.shape)
print("Reduced shape:", reduced_features.shape)
print("Chosen indices:", mi_selector.get_support(indices=True))

Chi-Square Test

Statistical test for independence between categorical variables, commonly used in classification tasks.

Procedure

Obtain categorical feature and target data.
Build contingency tables for each feature-target pair.
Compute expected frequencies and chi-square statistic.
Determine degrees of freedom and compare with critical value at chosen significance level.
Retain features with significant association.

Best suited for purely categorical settings.

Example

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, chi2

data = load_iris()
features, labels = data.data, data.target

chi_selector = SelectKBest(chi2, k=2)
reduced_features = chi_selector.fit_transform(features, labels)

print("Original shape:", features.shape)
print("Reduced shape:", reduced_features.shape)
print("Chosen indices:", chi_selector.get_support(indices=True))

Pearson Correlation

Quantifies linear relationship between continuous features and target.

Procedure

Supply cleaned continuous data.
Compute Pearson correlation coefficient for each feature-target pair.
Rank by absolute coefficient magnitude.
Select features above a set threshold or top k.

Insensitive to nonlinear patterns; sensitive to outliers.

Example

import numpy as np
from sklearn.feature_selection import SelectKBest, f_regression

np.random.seed(0)
samples = np.random.rand(100, 5)
response = samples[:, 0] + 2 * samples[:, 1] + np.random.normal(0, 0.1, 100)

corr_selector = SelectKBest(f_regression, k=2)
reduced_samples = corr_selector.fit_transform(samples, response)

print("Original shape:", samples.shape)
print("Reduced shape:", reduced_samples.shape)
print("Chosen indices:", corr_selector.get_support(indices=True))

Information Gain

Evaluates reduction in uncertainty of target variable when a feature is known. Popular in classification.

Procedure

Encode features numerically; discretize if necessary.
Compute initial entropy of target.
For each feature, calculate conditional entropy given feature; derive information gain as difference.
Rank features by gain; select top candidates.

Effective for categorical targets; less natural for continuous features.

Example

import numpy as np
from sklearn.feature_selection import mutual_info_classif

np.random.seed(0)
feat_matrix = np.random.rand(100, 5)
bin_target = np.random.randint(2, size=100)

gains = []
for col in range(feat_matrix.shape[1]):
    mi_score = mutual_info_classif(feat_matrix[:, col].reshape(-1, 1), bin_target)[0]
    gains.append(mi_score)

top_two = np.argsort(gains)[::-1][:2]
selected_data = feat_matrix[:, top_two]

print("Original shape:", feat_matrix.shape)
print("Reduced shape:", selected_data.shape)
print("Chosen indices:", top_two)

Comparative Overview

Method	Strengths	Limitations	Typical Use Case
Variance Threshold	Fast, easy, reduces dimensionality	Ignores relation to target; insensitive to correlations	Sparse or near-constant features
Mutual Information	Captures linear & nonlinear dependencies	Computationally heavier; unstable on small datasets	Complex relationships in modest-sized data
Chi-Square	Effective for categorical associations	Unsuitable for continuous features	Purely categorical classification problems
Pearson Correlation	Simple, fast linear measure	Misses nonlinear trends; outlier-sensitive	Linear relationships with clean continuous data
Information Gain	Strong for categorical targets; intuitive in trees	Needs discretization; costly for many feature	Classification with tree-based models

Tags: Machine Learning feature selection filter methods variance threshold mutual information

Posted on Mon, 15 Jun 2026 17:41:18 +0000 by linuxdoniv

Freaks City

Filter-Based Feature Selection Techniques in Machine Learning

Workflow

Variance Thresholding

Procedure

Example

Mutual Information

Procedure

Example

Chi-Square Test

Procedure

Example

Pearson Correlation

Procedure

Example

Information Gain

Procedure

Example

Comparative Overview

Hot Tags