-
Notifications
You must be signed in to change notification settings - Fork 16
Add pipeline loader #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pipeline loader #33
Conversation
40cd8e4 to
2c7d3fc
Compare
79cf4fc to
da6aebe
Compare
takeshi-yoshimura
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like your very smart idea. Thank you! Overall it looks fine and I may refactor tqdm to the original SafetensorsFileLoader later as my separate commits.
Only my request is to avoid GIL handling when this optimization is not enabled. I just want to keep old behaviors and the change may affect the critical path.
fastsafetensors/cpp/ext.cpp
Outdated
| .def(pybind11::init<const bool, const uint64_t, const int, bool>()) | ||
| .def("submit_read", &nogds_file_reader::submit_read) | ||
| .def("wait_read", &nogds_file_reader::wait_read); | ||
| .def("submit_read", &nogds_file_reader::submit_read, pybind11::call_guard<pybind11::gil_scoped_release>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must be very conservative to change all the old behaviors and this change potentially does it. Do you have any particular reasons to add these GIL handling? Is this a problem in new Python threading or general problems we overlooked so far?
If this is specific to this optimization, I do not want to add any overheads when turning off it. In that case, please add new wrapper functions to just call these functions with GIL and switch calls to them only when turning on the optimization.
|
I will make a subsequent update to this PR once the previous one is approved. The update will include an environment variable toggle to control the GIL. |
|
@ABNER-1 |
a3ba76e to
f2aecc0
Compare
Signed-off-by: yuanyuxing.yyx <[email protected]>
f2aecc0 to
98640b1
Compare
|
Hi,@takeshi-yoshimura . |
d9d23b6
into
foundation-model-stack:main
|
Cool! Thank you for your contribution. |
The 4th in #29
Based on PR #32
fastsafetensors could be abstracted into two stages:
The first stage primarily depends on file loading bandwidth (disk/network throughput + PCIe bandwidth from memory to GPU), while the second stage relies on collective communication between GPUs, typically leveraging high-bandwidth resources like NVLink.
These two stages depend on different resources and do not conflict.
By pipelining them—where the second batch proceeds with Stage 1 while the first batch is in Stage 2—we can fully utilize the potential of different resources and maximize performance.
