Pipeline parallelism for models in transformers #5450

KaiLv69 · 2024-04-23T07:17:41Z

KaiLv69
Apr 23, 2024

Hi, there. I'm working on using pipeline parallelism in DeepSpeed with models in transformers (like llama or mistral). I found that the pipeline engine in DeepSpeed only supports send/receive tensor between layers. While the output of transformers model's layer is a tuple of tensor.

How can I use pipeline parallelism with huggingface models? Is there any elegant way to apply patches?

Thanks for any suggestions.

HackGiter · 2024-05-25T12:19:10Z

HackGiter
May 25, 2024

You just need to wrap the transformer layer with custom nn.Module and set the engine has_attention_mask.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pipeline parallelism for models in transformers #5450

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Pipeline parallelism for models in transformers #5450

Uh oh!

KaiLv69 Apr 23, 2024

Replies: 1 comment

Uh oh!

HackGiter May 25, 2024

KaiLv69
Apr 23, 2024

HackGiter
May 25, 2024