Skip to content

Latest commit

 

History

History
151 lines (100 loc) · 4.48 KB

Loss_Function.md

File metadata and controls

151 lines (100 loc) · 4.48 KB

Loss Function

Overview

Table of Content

Classification

  • Cross-Entropy
  • NLL
  • Hinge
  • Huber
  • Kullback-Leibler

Regression

  • MAE (L1)
  • MSE (L2)

Metric Learning

  • Dice
  • Contrastive
  • N-pair
  • Triplet (Margin-based Loss)

Brief Description

Measuring "Distance" between the answer we expected and the true answer.

Beneral

Cross Entropy Loss

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1.

Cross-entropy loss increases as the predicted probability diverges from the actual label. A perfect model would have a log loss of 0.

Formula: (K class, y is one-hot vector, log is natural log)

$$ \operatorname{CE}(\mathbb{y}, \mathbb{\hat{y}}) = \displaystyle -\sum_{i=1}^K y_i \log(\hat{y}_i) $$

Binary Classification Problem

The last output layer of neural network should be sigmoid

(Another definition of Cross-Entropy)

$$ \operatorname{CE}(\mathbb{y}, \mathbb{\hat{y}}) = \displaystyle -\sum_{i=1}^2 y_i \log(\hat{y}_i) = -y_i \log(\hat{y}_i) + (1 - y_i)\log(1 - \hat{y}_i) $$

Multi-class Classification Problem

The last output layer of neural network should be softmax

Mean Square Error Loss

Negative Log Likelihood Loss

NLL Loss vs. Cross Entropy Loss

import torch
import torch.nn.functional as F

thetensor = torch.randn(3, 3)
target = torch.tensor([0, 2, 1])

# NLL Loss
sm = F.softmax(thetensor, dim=1)
log = torch.log(sm)
nllloss = F.nll_loss(log, target)

# CE Loss
celoss = F.cross_entropy(thetensor, target)

print(nllloss, 'should be equal to', celoss)

Metric Learning

Dice Loss

Dice Loss in Action

Contrastive Loss

Multi-class N-pair loss

Triplet Loss

$$ L = \max(d(a, b) - d(a, c) + \mathit{margin}, 0) $$

  • Distance measurement: $d(a, b) \sim$
    • $\operatorname{sum}(|A-B|)$
    • $(\operatorname{sum}((A-B)^2))^{1/2}$
    • $(\operatorname{sum}((A-B)^3))^{1/3}$
    • $\operatorname{Jaccard}(A, B)$
    • $\operatorname{Cosine}(A, B)$

Margin-based Loss

  • $A$: document
  • $B$: positive sample document
  • $C$: negative sample document
  • prediction $B$ or $C$

$$ L = \max{0, M - \cos(r_A, r_B) + \cos(r_A, r_C)} $$

Let similarity between $A$ and positive sample has "$M$" larger than negetive sample

--

Resources

Article

Github

Paper