Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid Offloading for ZeRO3 #5625

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tohtana
Copy link
Contributor

@tohtana tohtana commented Jun 7, 2024

NOTE: This feature works only for forward pass.

This feature allows users to gather ZeRO3-partitioned params and offload only a part of them to host memory. The offloaded parameters are loaded to device memory in pre-forward hook and offloaded back to host memory in post-forward hook.

You can reduce all-gather's in loop.

      with deepspeed.zero.ZeRO3HybridOffload(model, param_threshold=1e9):
          for x in dataset:
              output = model(x)

Generation using auto-regressive models is a good example of where this feature can be useful. In this example, ZeRO3 usually doesn't work because different ranks may produce different lengths. Allgather gets stuck after one of the ranks finishes generation.

      with deepspeed.zero.ZeRO3HybridOffload(model, param_threshold=1e9):
          output = model.generate(input_ids)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant