Validation time per image increases drastically in later epochs during Auto3DSeg training

Hi MONAI team,

I'm training a segmentation model from scratch using Auto3Dseg, and while training remains efficient (less than 1 minute per image), I’m encountering exponential increases in validation time per image during later epochs. The issue seems to worsen with each epoch. For example, during training epoch 294, the validation images processed in ~80–250 seconds. But by epoch 298, some validation images took over 3300s (55 minutes) per image!

Sample Output:
Epoch 294 vs 298 Validation Timing:

Final training  294/299 loss: 0.7182 acc_avg: 0.7918 acc [ 0.667  0.917] time 82.54s  lr: 2.0133e-07  
Val 294/300 0/12  ...  time 80.45s  
Val 294/300 3/12  ...  time 233.68s  
Val 294/300 5/12  ...  time 253.24s  

Final training  298/299 loss: 0.6506 acc_avg: 0.7909 acc [ 0.631  0.951] time 84.18s  lr: 2.2377e-08  
Val 298/300 5/12  ...  time 3393.77s  
Val 298/300 6/12  ...  time 1763.08s  
Val 298/300 8/12  ...  time 2419.77s  

Troubleshooting done:
- Confirmed it's not data I/O bottlenecks (validation data remains static)
- Tried emptying CUDA cache before/after validation steps
- Checked Segmenter class Validate function and postprocessing for memory leaks or side effects
- No large RAM/VRAM spikes observed
- Dataloader settings are unchanged across training

I am training on Windows 11, using GPU NVIDIA A40-48 GB, Pytorch=2.5.1, Cuda=12.4. Inference is taking ~ 0.37 seconds per image after training. 

Could you please help with this processing time issue? Thank you for your time and for the great framework!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validation time per image increases drastically in later epochs during Auto3DSeg training #8509

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validation time per image increases drastically in later epochs during Auto3DSeg training #8509

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions