Criterions are helpful to train a neural network. Given an input and a target, they compute a gradient according to a given loss function.
- Classification criterions :
- BCECriterion : binary cross-entropy (two-class version of ClassNLLCriterion);
- ClassNLLCriterion : negative log-likelihood for LogSoftMax (multi-class);
- CrossEntropyCriterion : combines LogSoftMax and ClassNLLCriterion;
- MarginCriterion : two class margin-based loss;
- MultiMarginCriterion : multi-class margin-based loss;
- MultiLabelMarginCriterion : multi-class multi-classification margin-based loss;
- Regression criterions :
- AbsCriterion : measures the mean absolute value of the element-wise difference between input;
- MSECriterion : mean square error (a classic);
- DistKLDivCriterion : Kullback–Leibler divergence (for fitting continuous probability distributions);
- Embedding criterions (measuring whether two inputs are similar or dissimilar):
- HingeEmbeddingCriterion : takes a distance as input ;
- L1HingeEmbeddingCriterion : L1 distance between two inputs;
- CosineEmbeddingCriterion : cosine distance between two inputs;
- Miscelaneus criterions :
- MultiCriterion : a weighted sum of other criterions;
- MarginRankingCriterion : ranks two inputs;
This is an abstract class which declares methods defined in all criterions. This class is serializable.
### [output] forward(input, target) ###Given an input
and a target
, compute the loss function associated to the criterion and return the
result. In general input
and target
are tensors, but some specific criterions
might require some other type of object.
The output
returned should be a scalar in general.
The state variable self.output should be updated after a call to forward()
.
Given an input
and a target
, compute the gradients of the loss function associated to the criterion and
return the result.In general input
, target
and gradInput
are tensors, but some specific criterions
might require some other type of object.
The state variable self.gradInput should be updated after a call to backward()
.
State variable which contains the result of the last forward(input, target) call.
### State variable: gradInput ###State variable which contains the result of the last backward(input, target) call.
## AbsCriterion ##criterion = nn.AbsCriterion()
Creates a criterion that
measures the mean absolute value of the element-wise difference between input x
and target y
:
loss(x,y) = 1/n \sum |x_i-y_i|
If x
and y
are d
-dimensional Tensors with a total of n
elements,
the sum operation still operates over all the elements, and divides by n
.
The division by n
can be avoided if one sets the internal variable sizeAverage
to false
:
criterion = nn.AbsCriterion()
criterion.sizeAverage = false
criterion = nn.ClassNLLCriterion(weights)
The negative log likelihood criterion. It is useful to train a classication
problem with n
classes. If provided, the optional argument weights
should be a 1D Tensor assigning weight to each of the classes. This is
particularly useful when you have an unbalanced training set.
The input
given through a forward()
is expected to contain
log-probabilities of each class: input
has to be a 1D tensor of size
n
. Obtaining log-probabilities in a neural network is easily achieved by
adding a LogSoftMax layer in the last layer of your
neural network. You may use
CrossEntropyCriterion instead, if you prefer
not to add an extra layer to your network.
This criterion expect a class index (1 to the number of class) as target
when calling forward(input, target) and
backward(input, target).
The loss can be described as:
loss(x, class) = forward(x, class) = -x[class]
or in the case of the weights
argument being specified:
loss(x, class) = forward(x, class) = -weights[class]*x[class]
The following is a code fragment showing how to make a gradient step
given an input x
, a desired output y
(an integer 1
to n
,
in this case n
= 2
classes),
a network mlp
and a learning rate learningRate
:
function gradUpdate(mlp,x,y,learningRate)
local criterion = nn.ClassNLLCriterion()
pred = mlp:forward(x)
local err = criterion:forward(pred, y);
mlp:zeroGradParameters();
local t = criterion:backward(pred, y);
mlp:backward(x, t);
mlp:updateParameters(learningRate);
end
criterion = nn.CrossEntropyCriterion(weights)
This criterion combines LogSoftMax and ClassNLLCriterion in one single class.
It is useful to train a classication problem with n
classes. If
provided, the optional argument weights
should be a 1D Tensor assigning
weight to each of the classes. This is particularly useful when you have an
unbalanced training set.
The input
given through a forward()
is expected to contain scores for
each class: input
has to be a 1D tensor of size n
. This criterion
expect a class index (1 to the number of class) as target
when calling
forward(input, target) and
backward(input, target).
The loss can be described as:
loss(x, class) = forward(x, class) = -log( e^x[class] / (\sum_j e^x[j]) )
= -x[class] + log( \sum_j e^x[j] )
or in the case of the weights
argument being specified:
loss(x, class) = forward(x, class) = weights[class]*( -x[class] + log( \sum_j e^x[j] ) )
criterion = nn.DistKLDivCriterion()
The Kullback–Leibler divergence criterion.
KL divergence is a useful distance
measure for continuous distributions and is often useful when performing
direct regression over the space of (discretely sampled) continuous output
distributions. As with ClassNLLCriterion, the input
given through a
forward()
is expected to contain log-probabilities, however unlike
ClassNLLCriterion, input
is not restricted to a 1D or 2D vector (as the criterion is applied element-wise).
This criterion expect a target
tensor of the same size as the input
tensor when calling forward(input, target) and
backward(input, target).
The loss can be described as:
loss(x, target) = sum_{all i}(target_i * (log(target_i) - x_i))
criterion = nn.BCECriterion()
Creates a criterion that measures the Binary Cross Entropy between the target and the output:
loss(t,o) = -(t * log(o) + (1 - t) * log(1 - o))
This is used for measuring the error of a reconstruction in for example an auto-encoder.
## MarginCriterion ##criterion = nn.MarginCriterion()
Creates a criterion that optimizes a two-class classification hinge loss (margin-based loss) between input x
(a Tensor of dimension 1) and output y
(which is a scalar, either 1 or -1) :
loss(x,y) = forward(x,y) = max(0,m- y x).
m
is the margin, which is by default 1.
criterion = nn.MarginCriterion(marginValue)
sets a different value of m
.
Example:
require "nn"
function gradUpdate(mlp, x, y, criterion, learningRate)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
mlp:updateParameters(learningRate)
end
mlp=nn.Sequential()
mlp:add(nn.Linear(5,1))
x1=torch.rand(5)
x2=torch.rand(5)
criterion=nn.MarginCriterion(1)
for i=1,1000 do
gradUpdate(mlp,x1,1,criterion,0.01)
gradUpdate(mlp,x2,-1,criterion,0.01)
end
print(mlp:forward(x1))
print(mlp:forward(x2))
print(criterion:forward(mlp:forward(x1),1))
print(criterion:forward(mlp:forward(x2),-1))
gives the output:
1.0043
[torch.Tensor of dimension 1]
-1.0061
[torch.Tensor of dimension 1]
0
0
i.e. the mlp successfully separates the two data points such that they both have a margin of 1, and hence a loss of 0.
## MultiMarginCriterion ##criterion = nn.MultiMarginCriterion(p)
Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input x
(a Tensor of dimension 1) and output y
(which is a target class index, 1 <= y <= x:size(1)) :
loss(x,y) = forward(x,y) = sum_i(max(0, 1 - (x[y] - x[i]))^p) / x:size(1)
where i = 1
to x:size(1)
and i ~= y
.
Note that this criterion also works with 2D inputs and 1D targets.
This criterion is especially useful for classification when used in conjunction with a module ending in the following output layer:
mlp = nn.Sequential()
mlp:add(nn.Euclidean(n,m)) -- outputs a vector of distances
mlp:add(nn.MulConstant(-1)) -- distance to similarity
criterion = nn.MultiLabelMarginCriterion()
Creates a criterion that optimizes a multi-class multi-classification hinge loss
(margin-based loss) between input x
(a 1D Tensor) and output y
(which is a 1D Tensor of target class indices) :
loss(x,y) = forward(x,y) = sum_ij(max(0, 1 - (x[y[j]] - x[i]))) / x:size(1)
where i = 1
to x:size(1)
, j = 1
to y:size(1)
, y[j] ~= 0
, and i ~= y[j]
for all i
and j
.
Note that this criterion also works with 2D inputs and targets.
y
and x
must have the same size. The criterion only considers the first non zero y[j]
targets.
This allows for different samples to have variable amounts of target classes:
criterion = nn.MultiLabelMarginCriterion()
input = torch.randn(2,4)
target = torch.Tensor{{1,3,0,0},{4,0,0,0}} -- zero-values are ignored
criterion:forward(input, target)
criterion = nn.MSECriterion()
Creates a criterion that measures the mean squared error between n
elements in the input x
and output y
:
loss(x,y) = forward(x,y) = 1/n \sum |x_i-y_i|^2 .
If x
and y
are d
-dimensional Tensors with a total of n
elements,
the sum operation still operates over all the elements, and divides by n
. The two tensors must
have the same number of elements (but their sizes might be different...)
The division by n
can be avoided if one sets the internal variable sizeAverage
to false
:
criterion = nn.MSECriterion()
criterion.sizeAverage = false
criterion = nn.MultiCriterion()
This returns a Criterion which is a weighted sum of other Criterion. Criterions are added using the method:
criterion:add(singleCriterion, weight)
where weight
is a scalar.
criterion = nn.HingeEmbeddingCriterion()
Creates a criterion that measures the loss given an input
x
which is a 1-dimensional vector and a label y
(1 or -1).
This is usually used for measuring whether two inputs are similar
or dissimilar, e.g. using the L1 pairwise distance,
and is typically used for
learning nonlinear embeddings or semi-supervised learning.
⎧ forward(x,y) = x, if y=1
loss(x,y) = ⎨
⎩ max(0,margin - x), if y=-1
The margin
has a default value of 1, or can be set in the constructor:
criterion = nn.HingeEmbeddingCriterion(marginValue)
Example use:
-- imagine we have one network we are interested in, it is called "p1_mlp"
p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2))
-- But we want to push examples towards or away from each other
-- so we make another copy of it called p2_mlp
-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage
-- that's why we create it again (so that the gradients of the pair don't wipe each other)
p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2))
p2_mlp:get(1).weight:set(p1_mlp:get(1).weight)
p2_mlp:get(1).bias:set(p1_mlp:get(1).bias)
-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp
prl = nn.ParallelTable()
prl:add(p1_mlp)
prl:add(p2_mlp)
-- now we define our top level network that takes this parallel table and computes the pairwise distance betweem
-- the pair of outputs
mlp= nn.Sequential()
mlp:add(prl)
mlp:add(nn.PairwiseDistance(1))
-- and a criterion for pushing together or pulling apart pairs
crit=nn.HingeEmbeddingCriterion(1)
-- lets make two example vectors
x=torch.rand(5)
y=torch.rand(5)
-- Use a typical generic gradient update function
function gradUpdate(mlp, x, y, criterion, learningRate)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
mlp:updateParameters(learningRate)
end
-- push the pair x and y together, notice how then the distance between them given
-- by print(mlp:forward({x,y})[1]) gets smaller
for i=1,10 do
gradUpdate(mlp,{x,y},1,crit,0.01)
print(mlp:forward({x,y})[1])
end
-- pull apart the pair x and y, notice how then the distance between them given
-- by print(mlp:forward({x,y})[1]) gets larger
for i=1,10 do
gradUpdate(mlp,{x,y},-1,crit,0.01)
print(mlp:forward({x,y})[1])
end
criterion = nn.L1HingeEmbeddingCriterion(margin)
Creates a criterion that measures the loss given an input
x
= {x1,x2}
, a table of two tensors, and a label y
(1 or -1):
This is used for measuring whether two inputs are similar
or dissimilar, using the L1 distance, and is typically used for
learning nonlinear embeddings or semi-supervised learning.
⎧ forward(x,y) = ||x1-x2||_1, if y=1
loss(x,y) = ⎨
⎩ max(0,margin - ||x1-x2||_1), if y=-1
The margin
has a default value of 1, or can be set in the constructor:
criterion = nn.L1HingeEmbeddingCriterion(marginValue)
criterion = nn.CosineEmbeddingCriterion(margin)
Creates a criterion that measures the loss given an input
x
= {x1,x2}
, a table of two tensors, and a label y
(1 or -1):
This is used for measuring whether two inputs are similar
or dissimilar, using the cosine distance, and is typically used for
learning nonlinear embeddings or semi-supervised learning.
margin
should be a number from -1 to 1, 0 to 0.5 is suggested.
Forward and Backward have to be used alternately. If margin
is missing, the default value is 0.
The loss function is:
⎧ forward(x,y) = 1-cos(x1, x2), if y=1
loss(x,y) = ⎨
⎩ max(0,cos(x1, x2)-margin), if y=-1
criterion = nn.MarginRankingCriterion(margin)
Creates a criterion that measures the loss given an input
x
= {x1,x2}
, a table of two Tensors of size 1 (they contain only scalars),
and a label y
(1 or -1):
If y
= 1
then it assumed the first input should be ranked higher (have a larger value)
than the second input, and vice-versa for y
= -1
.
The loss function is:
loss(x,y) = forward(x,y) = max(0,-y*(x[1]-x[2])+margin)
Example:
p1_mlp= nn.Linear(5,2)
p2_mlp= p1_mlp:clone('weight','bias')
prl=nn.ParallelTable()
prl:add(p1_mlp)
prl:add(p2_mlp)
mlp1=nn.Sequential()
mlp1:add(prl)
mlp1:add(nn.DotProduct())
mlp2=mlp1:clone('weight','bias')
mlpa=nn.Sequential()
prla=nn.ParallelTable()
prla:add(mlp1)
prla:add(mlp2)
mlpa:add(prla)
crit=nn.MarginRankingCriterion(0.1)
x=torch.randn(5)
y=torch.randn(5)
z=torch.randn(5)
-- Use a typical generic gradient update function
function gradUpdate(mlp, x, y, criterion, learningRate)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
mlp:updateParameters(learningRate)
end
for i=1,100 do
gradUpdate(mlpa,{{x,y},{x,z}},1,crit,0.01)
if true then
o1=mlp1:forward{x,y}[1];
o2=mlp2:forward{x,z}[1];
o=crit:forward(mlpa:forward{{x,y},{x,z}},1)
print(o1,o2,o)
end
end
print "--"
for i=1,100 do
gradUpdate(mlpa,{{x,y},{x,z}},-1,crit,0.01)
if true then
o1=mlp1:forward{x,y}[1];
o2=mlp2:forward{x,z}[1];
o=crit:forward(mlpa:forward{{x,y},{x,z}},-1)
print(o1,o2,o)
end
end