Essential Guide to Scikit-learn for Machine Learning

Scikit-learn is a Python library for machine learning, offering efficient tools for data mining and analysis. This guide covers its core concepts and practical usage.

Installation

Install Scikit-learn via pip:

pip install scikit-learn

Core Concepts

  • Dataset: Data is structured into features (input variables) and labels (target values).
  • Model: An implementation of a machine learning algorithm that learns from data to make predictions.
  • Training and Testing: Data is split into training sets (for model learning) and test sets (for evaluation).

Key Functionalities

Data Preprocessing

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Sample data
features = [[10, 20], [20, 30], [30, 40], [40, 50]]
targets = [0, 1, 0, 1]

# Split data
feat_train, feat_test, targ_train, targ_test = train_test_split(features, targets, test_size=0.2, random_state=42)

# Normalize data
normalizer = StandardScaler()
feat_train = normalizer.fit_transform(feat_train)
feat_test = normalizer.transform(feat_test)

Model Selection

Common models include:

  • Classification:
    • Logistic Regression: from sklearn.linear_model import LogisticRegression
    • Support Vector Classifier: from sklearn.svm import SVC
    • Decision Tree Classifier: from sklearn.tree import DecisionTreeClassifier
    • Random Forest Classifier: from sklearn.ensemble import RandomForestClassifier
  • Regression:
    • Linear Regression: from sklearn.linear_model import LinearRegression
    • Ridge Regression: from sklearn.linear_model import Ridge
    • Random Forest Regressor: from sklearn.ensemble import RandomForestRegressor

Model Training

from sklearn.linear_model import LogisticRegression

# Initialize model
classifier = LogisticRegression()

# Train model
classifier.fit(feat_train, targ_train)

Prediction

# Make predictions
predicted_labels = classifier.predict(feat_test)

Model Evaluation

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Accuracy
acc = accuracy_score(targ_test, predicted_labels)
print(f'Accuracy: {acc}')

# Confusion matrix
conf_mat = confusion_matrix(targ_test, predicted_labels)
print(f'Confusion Matrix:\n{conf_mat}')

# Classification report
class_report = classification_report(targ_test, predicted_labels)
print(f'Classification Report:\n{class_report}')

Classification Example

A complete example using the Iris dataset:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris_data = datasets.load_iris()
X = iris_data.data
y = iris_data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create model
clf = RandomForestClassifier()

# Train model
clf.fit(X_train, y_train)

# Predict
predictions = clf.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')

Regression Example

A regression example using the Boston housing dataset:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load dataset
boston_data = datasets.load_boston()
X = boston_data.data
y = boston_data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create model
regressor = LinearRegression()

# Train model
regressor.fit(X_train, y_train)

# Predict
pred_vals = regressor.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, pred_vals)
print(f'Mean Squared Error: {mse}')

Common Modules

  • Model Selection: sklearn.model_selection
  • Data Prperocessing: sklearn.preprocessing
  • Model Evaluation: sklearn.metrics
  • Ensemble Methods: sklearn.ensemble

Tags: machine-learning python scikit-learn data-science Tutorial

Posted on Thu, 04 Jun 2026 18:20:25 +0000 by locell