Compatibility of Progressive Layer Dropping & Pipeline Parallelism #7583

benearnthof · 2025-09-23T09:47:51Z

benearnthof
Sep 23, 2025

Hello, I've had a look through the repo and was able to use both Progressive Layer Dropping & Pipeline Parallelism successfully (but independent from one another) on a custom model I can train with DeepSpeed. When I enable PLD on a Pipeline Parallel model I get the error:

TypeError: PipelineModule.forward() got an unexpected keyword argument 'progressive_layer_drop'

My question is now: Are these two techniques compatible in DeepSpeed at all? It seems logical that there is a bit more overhead involved since the individual layers reside on different devices, could anyone point me in the right direction? I've had a look at how PLD is implemented in the BERT example here: https://github.com/deepspeedai/DeepSpeedExamples/blob/01f520e91d6b3235a4cabb1e7e634d9940319047/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L537 but the respective config does not utilize Pipeline Parallelism. Thanks in advance for any helpful info!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compatibility of Progressive Layer Dropping & Pipeline Parallelism #7583

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Compatibility of Progressive Layer Dropping & Pipeline Parallelism #7583

Uh oh!

benearnthof Sep 23, 2025

Replies: 0 comments

benearnthof
Sep 23, 2025