Fix Computation for large batch sizes #73

AD2605 · 2024-05-24T15:38:35Z

Fixes computation on batch sizes greater than 16

#64 and #63 are already merged in this PR

…tware#16) * Updating README-sycl.md to capture the 3.5 modifications * Update README-sycl.md Co-authored-by: aacostadiaz <[email protected]> * Remove the sgemm_nt_1_sycl PoC (codeplaysoftware#15) * Remove sgemm_nt_1 PoC * Fix build issues * Fix code style format * Remove ENABLE_NVPTX flag * Update include/cute/util/debug.hpp Co-authored-by: Mehdi Goli <[email protected]> * Cosmetic --------- Co-authored-by: Mehdi Goli <[email protected]> * Applying the comments --------- Co-authored-by: aacostadiaz <[email protected]>

…eplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3.

* Migrate Cute components to SYCL

* Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <[email protected]> --------- Co-authored-by: Mehdi Goli <[email protected]>

* Update README-sycl.md Fixing CUDA version

…tware#25)

Fix typo in Macro Co-authored-by: Mehdi Goli <[email protected]> * Cosmetic --------- Co-authored-by: Mehdi Goli <[email protected]> * Applying the comments --------- Co-authored-by: aacostadiaz <[email protected]> * Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3. * fix typo in macro --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: aacostadiaz <[email protected]>

Co-authored-by: Mehdi Goli <[email protected]>

Fix the calls to BlockDim* and GridDim* through SYCL. The current changes give incorrect output if you run a CUDA kernel through SYCL on NVIDIA A100. Co-authored-by: Mehdi Goli <[email protected]> --------- Co-authored-by: Mehdi Goli <[email protected]>

- Add block 2d load encapsulation combination : block_y X block_x X (8 x 16) where block_y in {1, 2, 4} and block_x in {1, 2, 4} and structure name follows the rule data_type x {8x16} x {block_y} x {block_x}_{V, N} and N means not-transposed , V means VNNI-packed - include some sycl type - include openCL extension intrinsic --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: Atharva Dubey <[email protected]> Co-authored-by: Muhammad Tanvir <[email protected]>

Removes the CUDA toolkit dependency when SYCL is enabled. --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: Atharva Dubey <[email protected]>

Fix some issues when trying to build and run the 14_ampere example for SYCL. --------- Co-authored-by: Muhammad Tanvir <[email protected]>

Enables Cute Tests via SYCL Path --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: aacostadiaz <[email protected]>

Add an Intel PVC pipeline to compute GEMM.

Add example for intel PVC GEMM

mehdi-goli and others added 4 commits April 4, 2024 18:05

Revert "Updating README-sycl.md to capture the 3.5 modifications (cod…

84e730f

…eplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3.

Merge remote-tracking branch 'upstream/sycl-develop' into sycl-develop

4e901a6

Merge branch 'codeplaysoftware:sycl-develop' into sycl-develop

9fd5087

AD2605 requested review from aacostadiaz and muhammad-tanvir-1211 and removed request for aacostadiaz May 24, 2024 15:40

AD2605 and others added 22 commits May 24, 2024 17:24

Migrate cute components to SYCL (codeplaysoftware#19)

450258c

* Migrate Cute components to SYCL

Add CMake configuration (codeplaysoftware#20)

cfd626e

* Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <[email protected]> --------- Co-authored-by: Mehdi Goli <[email protected]>

Update README-sycl.md (codeplaysoftware#22)

095ff18

* Update README-sycl.md Fixing CUDA version

fixing device only code that get called in the host side (codeplaysof…

6c8490e

…tware#25)

Fix GPU clock (codeplaysoftware#21)

ea03f7b

Add XE MMA/copy atom

63912b1

Update to 3.5 API

2773705

Apply suggestions from code review

80b306c

Co-authored-by: Mehdi Goli <[email protected]>

Update README-sycl.md (codeplaysoftware#31)

3cb6e4e

use cute::bfloat16_t (codeplaysoftware#32)

92ae962

Make atom type a make_2d_copy argument (codeplaysoftware#33)

d0ccbb9

Fix compilation errors with PVC pipeline (codeplaysoftware#58)

e056dc0

Remove CUDA toolkit dependency for SYCL (codeplaysoftware#49)

ca691ce

Removes the CUDA toolkit dependency when SYCL is enabled. --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: Atharva Dubey <[email protected]>

Define CUdeviceptr type (codeplaysoftware#60)

93d86c3

[CP-Sec] Add SECURITY.md file (codeplaysoftware#62)

5fe763c

Fix issues with the ampere example (codeplaysoftware#61)

6be2566

Fix some issues when trying to build and run the 14_ampere example for SYCL. --------- Co-authored-by: Muhammad Tanvir <[email protected]>

Enable Cute tests (codeplaysoftware#57)

342a371

Enables Cute Tests via SYCL Path --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: aacostadiaz <[email protected]>

Include changes for Intel PVC pipeline (codeplaysoftware#51)

0755375

Add an Intel PVC pipeline to compute GEMM.

Example changes for Intel PVC pipeline (codeplaysoftware#52)

7dcf967

Add example for intel PVC GEMM

fix formatting

806006b

AD2605 force-pushed the atharva/pvc_bug branch from e7ca5eb to 806006b Compare May 24, 2024 16:25

AD2605 closed this May 24, 2024

AD2605 deleted the atharva/pvc_bug branch May 24, 2024 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Computation for large batch sizes #73

Fix Computation for large batch sizes #73

AD2605 commented May 24, 2024

Fix Computation for large batch sizes #73

Fix Computation for large batch sizes #73

Conversation

AD2605 commented May 24, 2024