Compatibility of Progressive Layer Dropping & Pipeline Parallelism #7583
Unanswered
benearnthof
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I've had a look through the repo and was able to use both Progressive Layer Dropping & Pipeline Parallelism successfully (but independent from one another) on a custom model I can train with DeepSpeed. When I enable PLD on a Pipeline Parallel model I get the error:
My question is now: Are these two techniques compatible in DeepSpeed at all? It seems logical that there is a bit more overhead involved since the individual layers reside on different devices, could anyone point me in the right direction? I've had a look at how PLD is implemented in the BERT example here: https://github.com/deepspeedai/DeepSpeedExamples/blob/01f520e91d6b3235a4cabb1e7e634d9940319047/training/bing_bert/nvidia/modelingpreln_layerdrop.py#L537 but the respective config does not utilize Pipeline Parallelism. Thanks in advance for any helpful info!
Beta Was this translation helpful? Give feedback.
All reactions