Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

+/- sign in backward phase of the dice layer #32

Open
ntajbakhsh opened this issue Apr 21, 2017 · 9 comments
Open

+/- sign in backward phase of the dice layer #32

ntajbakhsh opened this issue Apr 21, 2017 · 9 comments

Comments

@ntajbakhsh
Copy link

Hi Fausto,
I wonder why you have a "+=" operator for the feature map 0 (background), but a "-=" operator for the feature map 1 (foreground).
bottom[btm].diff[i, 0, :] += 2.0 * ( (self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / ((self.union[i]) ** 2))

bottom[btm].diff[i, 1, :] -= 2.0 * ( (self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / ((self.union[i]) ** 2))

Also, dice is supposed to get maximized during the training process, but it appears that it is treated as a loss metric in your code (code seems to attempt to minimize it).

Could you please help me with the above 2 questions?

Nima

@faustomilletari
Copy link
Owner

faustomilletari commented Apr 21, 2017 via email

@ntajbakhsh
Copy link
Author

Thanks for the prompt reply!

Theoretically, both operators should be +=, right? I mean the dice metric is class agnostic, so I assume the operator should be the same for both foreground and background, or I'm missing something?

Could you point out where in the code you tell it to minimize the (- dice)? In the forward pass, you compute +dice, and I don't see it in the backward phase either.

@faustomilletari
Copy link
Owner

faustomilletari commented Apr 22, 2017 via email

@ntajbakhsh
Copy link
Author

Thanks. So, -= operator is acting as multiplying the gradient with -1, which should have been done for both bottom[btm].diff[i, 0, :] and bottom[btm].diff[i, 1, :], but you have intentionally not done for the background as a trick to speed up the training process, is my understanding correct?

@faustomilletari
Copy link
Owner

faustomilletari commented Apr 22, 2017 via email

@ntajbakhsh
Copy link
Author

All your explanations started to make sense after I re-read the paper and did the math by myself. Thanks!

@rishabhsshah
Copy link

Hi Fausto,

Firstly, thanks for open-sourcing the entire implementation of dice loss. It was very helpful. However, I am facing a few issues in understanding it.

I am unable to understand why you added the minus sign "-=" for the feature map 1 (foreground) in the backward pass of the Dice loss layer in python.

bottom[btm].diff[i, 1, :] -= 2.0 * ( (self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / ( (self.union[i]) ** 2))

Ideally, we are supposed to be maximizing the Dice score or minimize the -(Dice score). The derivative in the above formula is actually doing the opposite of maximizing the dice score. It is minimizing the dice score.

Whenever we compute derivative of a function with respect to an input it always point us to the direction of increase in the functions value. Therefore multiplying the derivative by a "-" sign will point us in the direction of minimizing it.

The gradient of the dice score can be calculated by the following formula:
2.0 * ((self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / (self.union[i]) ** 2))

That is the direction of increasing Dice score. When we add a "-=" sign in the the above code lines we are forcing the network to minimize the dice score.

Please help me understand where I am going wrong.

Thanks,
Rishabh

@sara-eb
Copy link

sara-eb commented Jan 13, 2019

@faustomilletari @ntajbakhsh
Hi everyone,
I went over all the comments here. I am implementing generalized dice loss python layer,
I have extracted the labels to be treated as binary segmentation problem, Now I have some misunderstanding on background labels, which is 0 values and is the channel [0] in a subvolume NxCxDxWxH (C stands for class numbers: 5 classes, and D stands for the depth of subvolume).

according to this line for binary segmentation with Blob shape with 2 feature maps, it is clear how to do that, what about generalized dice loss? because I have 4 feature maps for foreground classes. How can I calculate the diff for the background here?

Your expert opinion is really appreciated
Thanks

@sara-eb
Copy link

sara-eb commented Feb 19, 2019

Hi everyone,

I really need help with this, even I have gone through different implentation and as the user has mentioned here, however, I am not sure if my implementation is correct or not. The values of loss is between 0 and 1, and at some points becomes 1.

I implemented the loss explained in this reference, "Tversky loss, a generalized form of dice loss, which is identical to dice loss when alpha=beta=0.5",
However, I have a question and hoping to get a response here,
I implemented the Equation 3 , and calculated the backward pass according to Equation 4, which is the gradient of the loss in Equation 3 with respect to p0_i ( the output of the softmax layer, i.e., soft outputs), the probability of a voxel i to be object-of-interest (lesion) .
Authors have brought Equation 5 , which is the gradient of the loss in Equation 3 with respect to p1_i, the probability of a voxel i to be non-lesion.

I created the one-hot encoding of ground truth, and saved the gradients into two tensors,
dTdp0 (gradient with respect to p0) and dTdp1 (gradient with respect to p1).
What I did is that in the backward pass I only send the dTdP0 since I was not sure how to combine these two gradients in the gradient calculation,

def backward(self, top, propagate_down,bottom):
         for i in range(0, bottom[0].diff.shape[1]):
                 bottom[0].diff[:,i,:,:,:]-=self.dTdp0[:,i,:,:,:]

My questions are:

  1. How should I differentiate the gradient of background with label 0 , and other object classes? as earlier you have discussed in this thread.

  2. What about dTdp1? How can I include it in the gradient calculation?

Your help is really appreciate,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants