Skip to content

Bugfix: fix memory allocation issues during multi-GPU training#67

Open
LittleNyima wants to merge 1 commit intoLightricks:mainfrom
LittleNyima:main
Open

Bugfix: fix memory allocation issues during multi-GPU training#67
LittleNyima wants to merge 1 commit intoLightricks:mainfrom
LittleNyima:main

Conversation

@LittleNyima
Copy link

During multi-GPU training, the current implementation causes all text encoders across all ranks to be loaded onto the same GPU (thus causing CUDA OOM issues). The correct approach is to initialize the accelerator first and then load the model to CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant