Deep Learning Evaluation Metrics and Loss Functions

Evaluation Metrics Accuracy, Precision, and Recall

	Positive (Predicted Positive)	Negative (Predicted Negative)
True (Actual Positive)	TP	TN
False (Actual Negative)	FP	FN

For positive cases, the calculations are:

Accuracy (ACC): (Acc= \frac{TP+TN}{TP+TN+FP+FN})
Precision: (\frac{TP}{TP+FP})
Recall: (\frac{TP}{TP+FN})
F1 Score: (\frac{2\times Precision \times Recall}{Precision+ Recall})

Metric	Advantages	Disadvantages
Accuracy	- Intuitive and easy to understand	- Can be misleading in class imbalance scenarios
Precision	- Measures proportion of predicted positives that are actually positive; useful for avoiding false positives	- May ignore recall, leading to missed positive samples (false negatives)
Recall	- Measures model's ability to identify positive samples; useful for avoiding false negatives	- May result in lower precision, increasing false positives
F1 Score	- Blaances precision and recall, suitable for imbalanced tasks	- Doesn't separately reflect precision or recall, may not be suitable for scenarios requiring focus on one metric

BLEU Score BLEU uses an N-gram matching approach, comparing the similarity between machine translation and reference translation. The principle is simple - it calculates the proportion of matching n-grams between the two texts.

Original: 今天天气不错 Machine Translation: It is a nice day today Human Translation: Today is a nice day

For 1-gram: 5 words matched out of 6, giving a matching score of: (5/6)

For 3-gram: Matching score calculated as: (2/4)

After incorporating recall rate and penalty factor, the BLEU formula becomes:

[BLEU = BP \times exp(\sum_{n=1}^{N}W_nlogP_n) ]

Example implementation using sacrebleu library:

import sacrebleu
translations = ['我有一个帽衫', '大大的帽子']
references = [['你好，我有一个帽衫', '帽子大大的']]
bleu_score = sacrebleu.corpus_bleu(translations, references, tokenize='zh')
print(float(bleu_score.score))
# 59.809989126151606

Loss Functions Cross-Entropy Loss Cross-entropy loss is used for classification tasks, measuring the difference between predicted probability distribution and true label distribution. It's commonly used for multi-class problems. The formula for multi-class cross-entropy loss is:

[L = -\sum_{i=1}^{N}y_ilog(p_i) ]

Where (N) is the number of classes, (y_i) is the true label, and (p_i) is the model's predicted probability. For binary classification, the formula becomes: (Loss=−[ylog(p)+(1−y)log(1−p)])

In PyTorch, the key parameters for cross-entropy loss are:

label_smoothing (float, optional): Smooths labels to prevent overconfidence, improving generalization and mitigating class imbalance. For C classes, with smoothing factor ε: true class y=1 becomes 1-ε; other classes become ε/(C-1)
ignore_index (int, optional): Ignores specific labels, typically used for padding or invalid labels
reduction (str, optional): 'none', 'mean', or 'sum' for no aggregation, averaging, or summation
weight: Additional weight for each class during loss calculation

For cross-entropy loss implementation, consider input data format: InputShape: ((N,C)) or ((N,C,d_1,...,d_K)). Target: ((N)) or ((N,C,d_1,...,d_K)) where C is number of classes and N is batch size. The key is ensuring N and C are in the same dimension.

⭐Important note: PyTorch's cross-entropy loss already applies softmax/sigmoid, so no addditional activation is needed when using this loss function.

Mean Squared Error (MSE) Mean squared error is used for regression tasks, measuring the difference between predicted and actual values. MSE calculates the average of squared differences between predictions and actual values. The formula is:

[L = \frac{1}{N}\sum_{i=1}^{N}(y_i- p_i)^2 ]

Where (N) is the number of samples, (y_i) is the true value, and (p_i) is the predicted value.

Example: For a 3-class prediction, after sigmoid/softmax:

Prediction	True
0.3 0.3 0.4	0 0 1 (A)
0.3 0.4 0.3	0 1 0 (B)
0.1 0.2 0.7	1 0 0 (C)

MSE Calculation: (\frac{(0.3-0)^2+(0.3-0)^2+(0.4-1)^2+...}{3}=0.81)

Cross-Entropy Calculation: (\frac{-(0\times log0.3+ 0\times log0.3+ 1\times log0.4+ ...)}{3}=1.37)

Focal Loss Focal Loss addresses class imbalance by adding a wieghting factor to the standard cross-entropy loss.

[FL(p_t)=-\alpha_t(1-p_t)^{\gamma}log(p_t) ]

(\gamma): Modulating factor controlling penalty for easy samples. When (\gamma > 0), as (p_t) increases, ((1-p_t)^{\gamma}) decreases, reducing loss for easy samples. This makes the model focus more on difficult samples.
(\alpha): Balancing factor adjusting weights between positive and negative classes. Typically set as (\alpha) for positive and (1-\alpha) for negative classes.

import torch
import torch.nn as nn
import torch.nn.functional as F

class CustomFocalLoss(nn.Module):
    """Implementation of Focal Loss."""
    
    def __init__(self, gamma=2.0, alpha=0.25):
        super().__init__()
        self.gamma = gamma
        self.alpha = alpha

    def forward(self, predictions, targets, mask=None):
        """Calculates focal loss with optional masking."""
        bce_loss = F.binary_cross_entropy_with_logits(predictions, targets, reduction='none')
        prob = predictions.sigmoid()
        p_t = targets * prob + (1 - targets) * (1 - prob)
        focal_weight = (1.0 - p_t) ** self.gamma
        
        loss = bce_loss * focal_weight
        
        if self.alpha > 0:
            alpha_factor = targets * self.alpha + (1 - targets) * (1 - self.alpha)
            loss *= alpha_factor
        
        if mask is not None:
            loss *= mask.float()
            return loss.sum() / mask.sum()
        
        return loss.mean()

if __name__ == '__main__':
    height, width = 500, 500
    true_labels = torch.randint(0, 2, (height, width), dtype=torch.float32)
    extended_labels = torch.zeros(1000, 1000)
    extended_labels[:height, :width] = true_labels
    label_mask = torch.zeros(1000, 1000)
    label_mask[:height, :width] = 1 
    model_output = torch.randn(1, 1000, 1000)

    focal_loss_calculator = CustomFocalLoss()
    calculated_loss = focal_loss_calculator(model_output, extended_labels.unsqueeze(0), label_mask)
    print(calculated_loss)

Another improvement to Focal Loss is CB Loss for handling imbalanced sample distributions:

[\mathcal{L} = - \frac{1 - \beta}{1 - \beta^{n_y}} \sum (1 - p_y)^\gamma \log(p_y) ]

L1 Loss L1 loss measures prediction error as the absolute difference between predicted and true values:

[L = \frac{1}{N}\sum_{i=1}^{N}|y_i- \hat{y}_i| ]

Huber Loss Huber Loss combines the advantages of MSE and MAE, reducing sensitivity to outliers while maintaining good gradient properties:

[\mathrm{Huber~Loss}= \begin{cases} \frac{1}{2}(y-\hat{y})^2 & \mathrm{if}|y-\hat{y}|\leq\delta \ \delta*(|y-\hat{y}|-\frac{1}{2}*\delta) & \mathrm{otherwise} \end{cases} ]

Tags: Deep Learning evaluation metrics loss functions cross-entropy focal loss

Posted on Wed, 20 May 2026 03:57:49 +0000 by noobstar

Freaks City

Deep Learning Evaluation Metrics and Loss Functions

Hot Tags