Skip to content

Conversation

@ABNER-1
Copy link
Contributor

@ABNER-1 ABNER-1 commented Oct 20, 2025

The 4th in #29

Based on PR #32

fastsafetensors could be abstracted into two stages:

  • Stage 1: Copy file to device and construct tensor.
  • Stage 2: Tensor broadcasting.

The first stage primarily depends on file loading bandwidth (disk/network throughput + PCIe bandwidth from memory to GPU), while the second stage relies on collective communication between GPUs, typically leveraging high-bandwidth resources like NVLink.

These two stages depend on different resources and do not conflict.

By pipelining them—where the second batch proceeds with Stage 1 while the first batch is in Stage 2—we can fully utilize the potential of different resources and maximize performance.
image

@ABNER-1 ABNER-1 force-pushed the add_pipeline_loader branch 7 times, most recently from 40cd8e4 to 2c7d3fc Compare October 20, 2025 12:47
@ABNER-1 ABNER-1 changed the title Add pipeline loader [WIP]Add pipeline loader Oct 20, 2025
@ABNER-1 ABNER-1 force-pushed the add_pipeline_loader branch 4 times, most recently from 79cf4fc to da6aebe Compare October 21, 2025 08:08
@takeshi-yoshimura takeshi-yoshimura self-requested a review October 23, 2025 06:30
Copy link
Collaborator

@takeshi-yoshimura takeshi-yoshimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like your very smart idea. Thank you! Overall it looks fine and I may refactor tqdm to the original SafetensorsFileLoader later as my separate commits.

Only my request is to avoid GIL handling when this optimization is not enabled. I just want to keep old behaviors and the change may affect the critical path.

.def(pybind11::init<const bool, const uint64_t, const int, bool>())
.def("submit_read", &nogds_file_reader::submit_read)
.def("wait_read", &nogds_file_reader::wait_read);
.def("submit_read", &nogds_file_reader::submit_read, pybind11::call_guard<pybind11::gil_scoped_release>())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be very conservative to change all the old behaviors and this change potentially does it. Do you have any particular reasons to add these GIL handling? Is this a problem in new Python threading or general problems we overlooked so far?

If this is specific to this optimization, I do not want to add any overheads when turning off it. In that case, please add new wrapper functions to just call these functions with GIL and switch calls to them only when turning on the optimization.

@ABNER-1
Copy link
Contributor Author

ABNER-1 commented Nov 7, 2025

I will make a subsequent update to this PR once the previous one is approved. The update will include an environment variable toggle to control the GIL.

@takeshi-yoshimura
Copy link
Collaborator

@ABNER-1
I have just merged your previous PR. Can you resolve conflicts?

@ABNER-1 ABNER-1 force-pushed the add_pipeline_loader branch 4 times, most recently from a3ba76e to f2aecc0 Compare November 17, 2025 08:39
Signed-off-by: yuanyuxing.yyx <[email protected]>
@ABNER-1 ABNER-1 force-pushed the add_pipeline_loader branch from f2aecc0 to 98640b1 Compare November 17, 2025 08:41
@ABNER-1 ABNER-1 changed the title [WIP]Add pipeline loader Add pipeline loader Nov 17, 2025
@ABNER-1
Copy link
Contributor Author

ABNER-1 commented Nov 17, 2025

Hi,@takeshi-yoshimura .
I've reworked the GIL part as suggested.
Please review the PR.

@takeshi-yoshimura takeshi-yoshimura merged commit d9d23b6 into foundation-model-stack:main Nov 19, 2025
13 checks passed
@takeshi-yoshimura
Copy link
Collaborator

Cool! Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants