Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Operator] Enhancements to Reduce #366

Merged
merged 25 commits into from
Dec 20, 2023
Merged

[Operator] Enhancements to Reduce #366

merged 25 commits into from
Dec 20, 2023

Conversation

hjjq
Copy link
Member

@hjjq hjjq commented Oct 17, 2023

In some input shapes, the current reduce schedule will underutilize the GPU.
E.g., reduce [1, 128, 128, 3] , dims=[1, 2] will spawn 1 threadblock with 3 threads that each iterate over 128*128 elements.
This PR made two changes to optimize these cases:

  1. Add resolve_decompose in the resolve logic of Reduce. This will force launch separate kernels for each reduce dimension, increasing concurrency.
  2. In the default reduce schedule template, spawn multiple warps within the reduce dimensions, which then will communicate via shared memory or use atomics to perform the reduce.

Also added a resolve rule for AdaptivePoolChannelLast.

@yaoyaoding
Copy link
Member

Hi @hjjq,

Feel free to merge this PR if it is ready.

@hjjq
Copy link
Member Author

hjjq commented Nov 15, 2023

I will merge soon after I ensure it passes performance regression. There are probably also some rebase that needs to be done.

@yaoyaoding
Copy link
Member

Sounds good!

@hjjq
Copy link
Member Author

hjjq commented Dec 20, 2023

$hidet-ci launch

@hjjq hjjq merged commit 2040a7c into hidet-org:main Dec 20, 2023
2 checks passed
@hjjq hjjq deleted the conv-reg branch December 20, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants