Evaluation Metrics Accuracy, Precision, and Recall
| Positive (Predicted Positive) | Negative (Predicted Negative) | |
|---|---|---|
| True (Actual Positive) | TP | TN |
| False (Actual Negative) | FP | FN |
For positive cases, the calculations are:
- Accuracy (ACC): (Acc= \frac{TP+TN}{TP+TN+FP+FN})
- Precision: (\frac{TP}{TP+FP})
- Recall: (\frac{TP}{TP+FN})
- F1 Score: (\frac{2\times Precision \times Recall}{Precision+ Recall})
| Metric | Advantages | Disadvantages |
|---|---|---|
| Accuracy | - Intuitive and easy to understand | - Can be misleading in class imbalance scenarios |
| Precision | - Measures proportion of predicted positives that are actually positive; useful for avoiding false positives | - May ignore recall, leading to missed positive samples (false negatives) |
| Recall | - Measures model's ability to identify positive samples; useful for avoiding false negatives | - May result in lower precision, increasing false positives |
| F1 Score | - Blaances precision and recall, suitable for imbalanced tasks | - Doesn't separately reflect precision or recall, may not be suitable for scenarios requiring focus on one metric |
BLEU Score BLEU uses an N-gram matching approach, comparing the similarity between machine translation and reference translation. The principle is simple - it calculates the proportion of matching n-grams between the two texts.
Original: 今天天气不错 Machine Translation: It is a nice day today Human Translation: Today is a nice day
For 1-gram: 5 words matched out of 6, giving a matching score of: (5/6)
For 3-gram: Matching score calculated as: (2/4)
After incorporating recall rate and penalty factor, the BLEU formula becomes:
[BLEU = BP \times exp(\sum_{n=1}^{N}W_nlogP_n) ]
Example implementation using sacrebleu library:
import sacrebleu
translations = ['我有一个帽衫', '大大的帽子']
references = [['你好,我有一个帽衫', '帽子大大的']]
bleu_score = sacrebleu.corpus_bleu(translations, references, tokenize='zh')
print(float(bleu_score.score))
# 59.809989126151606
Loss Functions Cross-Entropy Loss Cross-entropy loss is used for classification tasks, measuring the difference between predicted probability distribution and true label distribution. It's commonly used for multi-class problems. The formula for multi-class cross-entropy loss is:
[L = -\sum_{i=1}^{N}y_ilog(p_i) ]
Where (N) is the number of classes, (y_i) is the true label, and (p_i) is the model's predicted probability. For binary classification, the formula becomes: (Loss=−[ylog(p)+(1−y)log(1−p)])
In PyTorch, the key parameters for cross-entropy loss are:
- label_smoothing (float, optional): Smooths labels to prevent overconfidence, improving generalization and mitigating class imbalance. For C classes, with smoothing factor ε: true class y=1 becomes 1-ε; other classes become ε/(C-1)
- ignore_index (int, optional): Ignores specific labels, typically used for padding or invalid labels
- reduction (str, optional): 'none', 'mean', or 'sum' for no aggregation, averaging, or summation
- weight: Additional weight for each class during loss calculation
For cross-entropy loss implementation, consider input data format: InputShape: ((N,C)) or ((N,C,d_1,...,d_K)). Target: ((N)) or ((N,C,d_1,...,d_K)) where C is number of classes and N is batch size. The key is ensuring N and C are in the same dimension.
⭐Important note: PyTorch's cross-entropy loss already applies softmax/sigmoid, so no addditional activation is needed when using this loss function.
Mean Squared Error (MSE) Mean squared error is used for regression tasks, measuring the difference between predicted and actual values. MSE calculates the average of squared differences between predictions and actual values. The formula is:
[L = \frac{1}{N}\sum_{i=1}^{N}(y_i- p_i)^2 ]
Where (N) is the number of samples, (y_i) is the true value, and (p_i) is the predicted value.
Example: For a 3-class prediction, after sigmoid/softmax:
| Prediction | True |
|---|---|
| 0.3 0.3 0.4 | 0 0 1 (A) |
| 0.3 0.4 0.3 | 0 1 0 (B) |
| 0.1 0.2 0.7 | 1 0 0 (C) |
MSE Calculation: (\frac{(0.3-0)^2+(0.3-0)^2+(0.4-1)^2+...}{3}=0.81)
Cross-Entropy Calculation: (\frac{-(0\times log0.3+ 0\times log0.3+ 1\times log0.4+ ...)}{3}=1.37)
Focal Loss Focal Loss addresses class imbalance by adding a wieghting factor to the standard cross-entropy loss.
[FL(p_t)=-\alpha_t(1-p_t)^{\gamma}log(p_t) ]
- (\gamma): Modulating factor controlling penalty for easy samples. When (\gamma > 0), as (p_t) increases, ((1-p_t)^{\gamma}) decreases, reducing loss for easy samples. This makes the model focus more on difficult samples.
- (\alpha): Balancing factor adjusting weights between positive and negative classes. Typically set as (\alpha) for positive and (1-\alpha) for negative classes.
import torch
import torch.nn as nn
import torch.nn.functional as F
class CustomFocalLoss(nn.Module):
"""Implementation of Focal Loss."""
def __init__(self, gamma=2.0, alpha=0.25):
super().__init__()
self.gamma = gamma
self.alpha = alpha
def forward(self, predictions, targets, mask=None):
"""Calculates focal loss with optional masking."""
bce_loss = F.binary_cross_entropy_with_logits(predictions, targets, reduction='none')
prob = predictions.sigmoid()
p_t = targets * prob + (1 - targets) * (1 - prob)
focal_weight = (1.0 - p_t) ** self.gamma
loss = bce_loss * focal_weight
if self.alpha > 0:
alpha_factor = targets * self.alpha + (1 - targets) * (1 - self.alpha)
loss *= alpha_factor
if mask is not None:
loss *= mask.float()
return loss.sum() / mask.sum()
return loss.mean()
if __name__ == '__main__':
height, width = 500, 500
true_labels = torch.randint(0, 2, (height, width), dtype=torch.float32)
extended_labels = torch.zeros(1000, 1000)
extended_labels[:height, :width] = true_labels
label_mask = torch.zeros(1000, 1000)
label_mask[:height, :width] = 1
model_output = torch.randn(1, 1000, 1000)
focal_loss_calculator = CustomFocalLoss()
calculated_loss = focal_loss_calculator(model_output, extended_labels.unsqueeze(0), label_mask)
print(calculated_loss)
Another improvement to Focal Loss is CB Loss for handling imbalanced sample distributions:
[\mathcal{L} = - \frac{1 - \beta}{1 - \beta^{n_y}} \sum (1 - p_y)^\gamma \log(p_y) ]
L1 Loss L1 loss measures prediction error as the absolute difference between predicted and true values:
[L = \frac{1}{N}\sum_{i=1}^{N}|y_i- \hat{y}_i| ]
Huber Loss Huber Loss combines the advantages of MSE and MAE, reducing sensitivity to outliers while maintaining good gradient properties:
[\mathrm{Huber~Loss}= \begin{cases} \frac{1}{2}(y-\hat{y})^2 & \mathrm{if}|y-\hat{y}|\leq\delta \ \delta*(|y-\hat{y}|-\frac{1}{2}*\delta) & \mathrm{otherwise} \end{cases} ]