Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dice Loss (and Intersection Over Union) #10890

Closed
wkeithvan opened this issue Aug 10, 2018 · 7 comments
Closed

Add Dice Loss (and Intersection Over Union) #10890

wkeithvan opened this issue Aug 10, 2018 · 7 comments
Labels
type:feature The user is asking for a new feature.

Comments

@wkeithvan
Copy link

I propose that Dice Score/Loss (also known as F1-score or Sorensen score) is added as a metric and loss function as these are very commonly used in image segmentation or bounding box problems. Within the medical community, this is an incredibly important function, although I have seen it in other areas like astronomy.

Additionally, the Intersection Over Union (IoU) (also known as Jaccard Index) is another important metric/loss for these same classes of problem. While the Dice and IoU are very similar functions, the Dice Score weights true positives (the intersection) more heavily than false positives and false negatives than IoU (which gives a more even weighting to TP, FP, & FN). In cases where false positives and false negatives can be very detrimental, the IoU will produce a better result than Dice. Andrew Ng (Stanford Prof, Google Brain co-founder, Coursera founder) even devotes an entire video to IoU in his convolutional neural networks (CNN's) course as the metric to use to determine if your bounding box predictions are working.

According to Pull #7032 which sought to add Dice, the end result was the request was closed and that it will only be added if the community continues to bring it up and express interest in adding Dice (See Comment by @fchollet ).

As time has passed and interest in CNN's has skyrocketed, I suggest that we reconsider adding Dice and IoU as they are becoming more and more common place. Dice has been proposed or mentioned in issues and pull requests multiple times (#292, #369, #2115, #2994, #3442, #3457, #3611, #3653, #3977, #5916, #6933, #7032, #8961, #9154, #9275, #9395, #9444, #9671, #10783). Additionally, IoU has also been mentioned a number of times in this repository (#2016, #2185, #6467, #6538, #8225, #8643, #8669, #9367, #10104, #10602, #10783). Keep in mind that those references are only with in this repository... there are plenty of other Github repositories that use Dice and/or IoU as a loss function.

Furthermore, the research community has used Dice or IoU in numerous papers that make use of CNN's. Here are a few that each have over 100 citations according to Google Scholar, though many more exist... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 (sorry ahead of time if they are behind a paywall).

I personally have been working in medical research with a U-Net for image segmentation and have found that training the model with binary cross entropy as my loss function at first and then switching to dice loss for additional training has significantly improved my performance over using only binary cross entropy. I have been using a custom loss to use Dice loss, however, it would be great to see an official version of this supported by Keras.

Given that over a year has past since PR #7032, would the Keras team reconsider implementing an official version of Dice and IoU loss functions?

@sarim-zafar
Copy link

Any updates on this?

@Kautenja
Copy link

Kautenja commented Sep 3, 2018

For anyone interested in this, I've implemented an IoU metric for evaluating semantic segmentation results here. I'd be happy to merge it into the Keras codebase.

@kylepob
Copy link

kylepob commented Sep 23, 2018

I can also confirm that these metrics are important to segmentation tasks in the biometric domain. It would be really great to have an official implementation available.

@ASTalleruphuus
Copy link

I can also confirm the need of these metrics and loss functions in the field of medical image segmentation, that often deals with data with great class imbalances. An official implementation would be deeply appreciated!

@Kautenja
Copy link

Kautenja commented Oct 11, 2018

After working with the I/U metrics for some time with Keras, I noticed unexpected values compared to reported baselines. After reviewing how metrics are calculated in Keras, I discovered that they are calculated batch-wise and averaged together. This is contrary to how I/U is typically calculated for semantic segmentation tasks, i.e., over an entire subset of data. Batch-wise I/U may be useful in certain contexts and applications, so a simple I/U metric built in may still be worth considering. However, correct semantic segmentation evaluation necessitates a novel solution.

Potential solutions I've considered to correctly calculate I/U dataset-wise:

  1. set batch size equal to size of dataset
    • terrible space complexity, infeasible for large datasets
    • restricts training configurations
  2. aggregating a global confusion matrix to calculate I/U from
    • memory efficient, just needs a single CxC matrix to add to
    • hard to implement (requires a complementary callback for advanced functionality)
      • how to allocate the CxC matrix on training start?
      • how to reset to zero between epochs?
      • how to reset to zero for a validation/evaluation set?
      • how to access effectively to calculate I/U?
  3. write external methods to evaluate models using above technique, but with NumPy
    • memory efficient, just needs a single CxC matrix to add to
    • easy to implement
    • does not integrate with Keras model fitting (training / validation)
    • no GPU acceleration for calculating confusion matrices batch-wise
  4. write a callback to calculate I/U for a validation dataset
    • easy to implement
    • integrates with Keras well (i.e., free GPU acceleration)
    • contradicts some standard Keras API (e.g., validation data args in fit, fit_generator)
      • doesn't integrate with other metrics baked into the model
    • no output for training or evaluation metrics
    • have to reimplement accuracy metrics in NumPy to evaluate (easy with confusion matrix)

I ended up using tek 3 to implement SegNet and Tiramisu in Keras as it's the simplest solution and I only really needed test time I/U metrics anyway as loss and accuracy are enough for the training process. Anyone interested can find the evaluation code here and the individual metrics that module uses here.

EDIT: clarify meaning and some grammar

@gabrieldemarmiesse gabrieldemarmiesse added the type:feature The user is asking for a new feature. label Oct 11, 2018
@EmielBoss
Copy link

I would also really want this; I'm going crazy implementing it. I'll just spam my question here as well:

I am trying to perform semantic segmentation in TensorFlow 1.10 with eager execution with the generalized dice loss function:

def generalized_dice_loss(onehots_true, logits):
    onehots_true, logits = mask(onehots_true, logits) # Not all of my pixels contain ground truth, and I filter those pixels out, which results in shape [num_gt_pixels, num_classes]-shaped labels and logits.
    probabilities = tf.nn.softmax(logits)
    weights = 1.0 / (tf.reduce_sum(onehots_true, axis=0)**2)
    weights = tf.clip_by_value(weights, 1e-17, 1.0 - 1e-7) # Is this the correct way of dealing with inf values (the results of zero divisions)?
    numerator = tf.reduce_sum(onehots_true * probabilities, axis=0)
    numerator = tf.reduce_sum(weights * numerator)

    denominator = tf.reduce_sum(onehots_true + probabilities, axis=0)
    denominator = tf.reduce_sum(weights * denominator)

    loss = 1.0 - 2.0 * (numerator + 1e-17) / (denominator + 1e-17)
    return loss

However, I am struggling to get any meaningful loss which isn't always 1. What am I doing wrong here?

After the initial weights (one for each class) are calculated, they contain many inf's from zero divisions, as typically only a small subset of all classes is present in a sample image. Therefore, I clip the weights to the range [1e-17, 1-1e-17] (is this a good idea?), after which they look like this:

tf.Tensor(
[4.89021e-05 2.21410e-10 5.43187e-11 1.00000e+00 1.00000e+00 4.23855e-07
 5.87461e-09 3.13044e-09 2.95369e-07 1.00000e+00 1.00000e+00 2.22499e-05
 1.00000e+00 1.73611e-03 9.47212e-10 1.12608e-05 2.77563e-09 1.00926e-08
 7.74787e-10 1.00000e+00 1.34570e-07], shape=(21,), dtype=float32)

which seems fine to me, though they are pretty small. The numerators (tf.reduce_sum(onehots_true * probabilities, axis=0), prior to their weighting) look like this:

tf.Tensor(
[3.42069e+01 0.00000e+00 9.43506e+03 7.88478e+01 1.50554e-02 0.00000e+00
 1.22765e+01 4.36149e-01 1.75026e+02 0.00000e+00 2.33183e+02 1.81064e-01
 0.00000e+00 1.60128e+02 1.48867e+04 0.00000e+00 3.87697e+00 4.49753e+02
 5.87062e+01 0.00000e+00 0.00000e+00], shape=(21,), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)

which also looks reasonable, since they're basically the labels' respective sizes times the network's certainty about them (which is likely low in the beginning of training). The denominators (tf.reduce_sum(onehots_true + probabilities, axis=0), prior to weighting) also look fine:

tf.Tensor(
[ 14053.483   25004.557  250343.36    66548.234    6653.863    3470.502
   5318.3926 164206.19    19914.338    1951.0701   3559.3235   7248.4717
   5984.786    7902.9004 133984.66    41497.473   25010.273   22232.062
  26451.926   66250.39     6497.735 ], shape=(21,), dtype=float32)

These are large, but that is to be expected since the class probabilities of a pixel sum to 1, and therefore the sum of these denominators should more or less equal the amount of pixels with ground truth.

However, summing the numerators gives a very small sum (~0.001, though occasionally it's in a single digit range) while the denominator sums to very large values. This results in my final loss being exclusively 1, or something really close to that. Does anyone know how I can mitigate this effect and obtain stable gradients?

@dynamicwebpaige
Copy link

Would it be possible to request this as a loss in tensorflow/addons, rather than Keras?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature The user is asking for a new feature.
Projects
None yet
Development

No branches or pull requests

9 participants