-
Notifications
You must be signed in to change notification settings - Fork 16
Fix distributed loading when using paddle #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello, I need to confirm the parallel loading means we use multi processes to load all tensors to one gpu? Or use multi processes to load all tensors to different gpu then broadcast them? |
|
@zeroRains
It depends on what you want to do. test cases use a single GPU due to their limited environment, but in realistic workloads processes should load files to each GPU and broadcast/scatter tensors. |
fix the distributed load for paddle remove useless file make sure the device id does not exceed the device count Signed-off-by: zeroRains <[email protected]>
| d_id = device.split(":") # "gpu:0" or "gpu" | ||
| d_id = int(d_id[1]) if len(d_id) == 2 else 0 | ||
| if isinstance(self.pg, SingleGroup): | ||
| # For single (gpu:x, gpu) | ||
| # gpu:x, like gpu:0, gpu:1, ... | ||
| d_id = device.split(":") | ||
| d_id = int(d_id[1]) if len(d_id) == 2 else 0 | ||
| else: | ||
| # For distributed | ||
| # The gpu determines the current rank | ||
| # rank0 use gpu:0, rank1 use gpu:1 | ||
| d_id = self.pg.rank() % paddle.device.cuda.device_count() | ||
| self.device = f"gpu:{d_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this part, maybe, It dose not need to consider distributed case in fastsafetensors.
We just need to load the tensors to correct device which provided by user.
In a machine with multi gpus, user should set the device like that device="gpu:{pg.rank()}" in distributed code then send device to the SafeTensorsFileLoader so that different processes can load tensors to different gpus .
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because safetensors files that are distributed online are not composed like that.
18391ca
into
foundation-model-stack:main
|
Thank you! |
Environment:
I modify the distributed loading command and write two .sh file
run_paddle_parallel_cpu.shandrun_paddle_parallel_gpu.sh. It is a standard distributed lauching command in paddle.I also add a unit test to test distibuted loading tensors with paddle.