Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Computation for large batch sizes #73

Closed
wants to merge 27 commits into from

Conversation

AD2605
Copy link
Collaborator

@AD2605 AD2605 commented May 24, 2024

Fixes computation on batch sizes greater than 16

#64 and #63 are already merged in this PR

mehdi-goli and others added 4 commits April 4, 2024 18:05
…tware#16)

* Updating README-sycl.md to capture the 3.5 modifications

* Update README-sycl.md

Co-authored-by: aacostadiaz <[email protected]>

* Remove the sgemm_nt_1_sycl PoC (codeplaysoftware#15)

* Remove sgemm_nt_1 PoC

* Fix build issues

* Fix code style format

* Remove ENABLE_NVPTX flag

* Update include/cute/util/debug.hpp

Co-authored-by: Mehdi Goli <[email protected]>

* Cosmetic

---------

Co-authored-by: Mehdi Goli <[email protected]>

* Applying the comments

---------

Co-authored-by: aacostadiaz <[email protected]>
@AD2605 AD2605 requested review from aacostadiaz and muhammad-tanvir-1211 and removed request for aacostadiaz May 24, 2024 15:40
AD2605 and others added 22 commits May 24, 2024 17:24
* Add cmake configuration

* Update examples/cute/tutorial/CMakeLists.txt

Co-authored-by: Mehdi Goli <[email protected]>

---------

Co-authored-by: Mehdi Goli <[email protected]>
* Update README-sycl.md

Fixing CUDA version
Fix typo in Macro
Co-authored-by: Mehdi Goli <[email protected]>

* Cosmetic

---------

Co-authored-by: Mehdi Goli <[email protected]>

* Applying the comments

---------

Co-authored-by: aacostadiaz <[email protected]>

* Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17)

This reverts commit a726bd3.

* fix typo in macro

---------

Co-authored-by: Mehdi Goli <[email protected]>
Co-authored-by: aacostadiaz <[email protected]>
Fix the calls to BlockDim* and GridDim* through SYCL. The current changes give incorrect output if you run a CUDA kernel through SYCL on NVIDIA A100.

Co-authored-by: Mehdi Goli <[email protected]>

---------

Co-authored-by: Mehdi Goli <[email protected]>
- Add block 2d load encapsulation
combination : block_y X block_x X (8 x 16) where block_y in {1, 2, 4} and block_x in {1, 2, 4}
and structure name follows the rule data_type x {8x16} x {block_y} x {block_x}_{V, N} and N means not-transposed , V means VNNI-packed

- include some sycl type
- include openCL extension intrinsic
---------

Co-authored-by: Mehdi Goli <[email protected]>
Co-authored-by: Atharva Dubey <[email protected]>
Co-authored-by: Muhammad Tanvir <[email protected]>
Removes the CUDA toolkit dependency when SYCL is enabled.

---------

Co-authored-by: Mehdi Goli <[email protected]>
Co-authored-by: Atharva Dubey <[email protected]>
Fix some issues when trying to build and run the 14_ampere example for SYCL.

---------

Co-authored-by: Muhammad Tanvir <[email protected]>
Enables Cute Tests via SYCL Path

---------

Co-authored-by: Mehdi Goli <[email protected]>
Co-authored-by: aacostadiaz <[email protected]>
Add an Intel PVC pipeline to compute GEMM.
@AD2605 AD2605 closed this May 24, 2024
@AD2605 AD2605 deleted the atharva/pvc_bug branch May 24, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants