Implement mixed precision for training and inference #20

LorenzLamm · 2023-07-10T14:05:33Z

Mixed precision for both training and inference.

Using mixed precision during training allows for larger batch sizes and slightly faster training speed (time-wise bottleneck is still data augmentation, which is not affected here).

For segmentation, there is also a slight speed-up, but the main advantage is that inference requires less GPU memory. Thus, it should now be easily possible to perform the prediction using an 8GB GPU -- previously, 8GB was just about the margin causing problems for some users.

…mixed_precision

LorenzLamm · 2023-07-10T14:18:31Z

src/membrain_seg/segmentation/training/optim_utils.py

Needed to switch to binary_cross_entropy_with_logits because normal binary_cross_entropy was not compatible with the mixed precision.

kevinyamauchi

LGTM @LorenzLamm ! I have a question below, but it's more of a curiosity.

kevinyamauchi · 2023-07-20T16:50:19Z

src/membrain_seg/segmentation/training/optim_utils.py

        data = sigmoid(data)
        mask = target != self.ignore_label

        # Compute the cross entropy loss while ignoring the ignore_label
        target_comp = target.clone()
        target_comp[target == self.ignore_label] = 0
        target_tensor = torch.tensor(target_comp, dtype=data.dtype, device=data.device)
-        bce_loss = binary_cross_entropy(data, target_tensor, reduction="none")
+
+        bce_loss = binary_cross_entropy_with_logits(


If I understand correctly, this function applies a sigmoid to the prediction before computing the loss. Have you checked how this impacts training performance?

https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html#torch.nn.BCEWithLogitsLoss

Haven't checked in detail how it influences training performance, except that it re-ran training to see if loss curves behave the same: In this sense, there is no difference.

Including the sigmoid in the loss function is supposed to be more numerically stable, as the optimization uses the log sum exp trick to compute the loss: https://gregorygundersen.com/blog/2020/02/09/log-sum-exp/

I guess this helps especially in case of exploding or vanishing gradients, which I didn't observe during training.

Other than that, the loss function is the same as the one I used previously. Note that the binary_cross_entropy_with_logits function uses as input the orig_data variable instead of previously data where sigmoid is applied above.
So both the previous version and the new version first do sigmoid, and then cross entropy. Only the new version computes gradients in a single pass for more stability.

LorenzLamm added 2 commits July 10, 2023 16:00

Implement mixed precision for training and inference

2999335

Merge branch 'main' of https://github.com/teamtomo/membrain-seg into …

ae5cb47

…mixed_precision

LorenzLamm marked this pull request as ready for review July 10, 2023 14:16

LorenzLamm commented Jul 10, 2023

View reviewed changes

kevinyamauchi approved these changes Jul 20, 2023

View reviewed changes

LorenzLamm merged commit 0e5d732 into main Jul 20, 2023
11 checks passed

LorenzLamm deleted the mixed_precision branch July 20, 2023 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement mixed precision for training and inference #20

Implement mixed precision for training and inference #20

LorenzLamm commented Jul 10, 2023 •

edited

Loading

LorenzLamm Jul 10, 2023

kevinyamauchi left a comment

kevinyamauchi Jul 20, 2023

LorenzLamm Jul 20, 2023

Implement mixed precision for training and inference #20

Implement mixed precision for training and inference #20

Conversation

LorenzLamm commented Jul 10, 2023 • edited Loading

LorenzLamm Jul 10, 2023

Choose a reason for hiding this comment

kevinyamauchi left a comment

Choose a reason for hiding this comment

kevinyamauchi Jul 20, 2023

Choose a reason for hiding this comment

LorenzLamm Jul 20, 2023

Choose a reason for hiding this comment

LorenzLamm commented Jul 10, 2023 •

edited

Loading