Replies: 1 comment
-
You just need to wrap the transformer layer with custom nn.Module and set the engine has_attention_mask. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, there. I'm working on using pipeline parallelism in DeepSpeed with models in transformers (like llama or mistral). I found that the
pipeline engine
in DeepSpeed only supports send/receive tensor between layers. While the output of transformers model's layer is a tuple of tensor.How can I use pipeline parallelism with huggingface models? Is there any elegant way to apply patches?
Thanks for any suggestions.
Beta Was this translation helpful? Give feedback.
All reactions