Skip to content

Conversation

@copybara-service
Copy link

@copybara-service copybara-service bot commented Dec 4, 2025

Improve IfrtMergeReshardsPass to allow for more efficient merging

The current implementation can only handle copy operations whose source operations are the same, but there can be many parallel copies that have the same source and destination devices but with arguments produced by different ops. See the added test for an example.

This CL improves the algorithm (and as a result simplifies the implementation as well) to allow for merging such parallel copies. The intuition is that the existing grouping based on the first reshard user op is actually sufficient to prevent any circular dependency after merging because reshard X that (transitively) depends on reshard Y can never have the same first reshard user op by definition. Thus, we can simply iterate over all reshard ops in the function and merge them based on the same keys that we are using today.

The algorithm now runs iteratively until fixpoint because merging some reshard ops changes the "first reshard user op", which may create more merging opportunities. The added test fails without this loop.

This is particularly useful when arguments are progressively broadcast over multiple pipeline stage submeshes because argument broadcast and across-stage transfers for intermediates will be completely batchable as long as their broadcast order is the same.

@copybara-service copybara-service bot force-pushed the test_840059957 branch 5 times, most recently from 7258296 to 1776e5c Compare December 4, 2025 19:49
The current implementation can only handle copy operations whose source operations are the same, but there can be many parallel copies that have the same source and destination devices but with arguments produced by different ops. See the added test for an example.

This CL improves the algorithm (and as a result simplifies the implementation as well) to allow for merging such parallel copies. The intuition is that the existing grouping based on the first reshard user op is actually sufficient to prevent any circular dependency after merging because reshard X that (transitively) depends on reshard Y can never have the same first reshard user op by definition. Thus, we can simply iterate over all reshard ops in the function and merge them based on the same keys that we are using today.

The algorithm now runs iteratively until fixpoint because merging some reshard ops changes the "first reshard user op", which may create more merging opportunities. The added test fails without this loop.

This is particularly useful when arguments are progressively broadcast over multiple pipeline stage submeshes because argument broadcast and across-stage transfers for intermediates will be completely batchable as long as their broadcast order is the same.

PiperOrigin-RevId: 840342080
@copybara-service copybara-service bot merged commit bd5b187 into main Dec 4, 2025
@copybara-service copybara-service bot deleted the test_840059957 branch December 4, 2025 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant