Skip to content

Commit

Permalink
add persistent variant (#77)
Browse files Browse the repository at this point in the history
Summary:
Hongtao identified the performance issue with the initial implementation and updated the assignments of tiles to each SM.

Performance with warp specialization
  (Batch, Heads, SeqLen, Dhead)    triton_tutorial_flash_v2_tma_ws_persistent-tflops    triton_tutorial_flash_v2_tma_ws-tflops    triton_tutorial_flash_v2-tflops
-------------------------------  ---------------------------------------------------  ----------------------------------------  ---------------------------------
             (8, 16, 8192, 128)                                              516.164                                   490.451                            423.905

Pull Request resolved: #77

Reviewed By: xuzhao9, htyu

Differential Revision: D66463179

Pulled By: manman-ren

fbshipit-source-id: 14fecc1a1449828bfd82600bd161596349da3084
  • Loading branch information
manman-ren authored and facebook-github-bot committed Dec 2, 2024
1 parent 7e1f269 commit 0a82d3d
Show file tree
Hide file tree
Showing 4 changed files with 306 additions and 10 deletions.
1 change: 1 addition & 0 deletions test/test_gpu/skip_tests_h100_pytorch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ flash_attention:
- triton_tutorial_flash_v2_tma
- triton_tutorial_flash_v2_ws
- triton_tutorial_flash_v2_tma_ws
- triton_tutorial_flash_v2_tma_ws_persistent
fp8_attention:
- colfax_fmha
# triton_flash_v2 requires triton-main
Expand Down
1 change: 1 addition & 0 deletions test/test_gpu/skip_tests_h100_triton_main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ flash_attention:
# _ws kernels require Triton with warp specialization
- triton_tutorial_flash_v2_ws
- triton_tutorial_flash_v2_tma_ws
- triton_tutorial_flash_v2_tma_ws_persistent
fp8_attention:
# fb-only kernel
- colfax_fmha
Expand Down
Loading

0 comments on commit 0a82d3d

Please sign in to comment.