Fix CI test failures #55

xuzhao9 · 2024-11-18T19:53:56Z

The unit test workflow seems to hang needs to be fixed: https://github.com/pytorch-labs/tritonbench/actions/runs/11898546601/job/33155282740

This PR rewrites the unit test function to run each test in an individual subprocess.

xuzhao9 · 2024-11-19T00:42:39Z

tritonbench/operators/fp8_attention/operator.py

@@ -110,7 +110,7 @@ def triton_flash_v2(
        triton_q, triton_k, triton_v = self.triton_preprocess(q, k, v)
        # full fp8 will be enabled if type of q,k,v is fp8
        return lambda: triton_attention(
-            triton_q, triton_k, triton_v, False, self.sm_scale
+            triton_q, triton_k, triton_v, False, self.sm_scale, "base"


cc @manman-ren attention_opt will compile error on the pytorch version of Triton, does it require the latest Triton main branch?

Sorry for the breakage. What is the error message?

@manman-ren here is the error message: https://github.com/pytorch-labs/tritonbench/actions/runs/11903695593/job/33171153429?pr=55. By default we are using the pytorch built-in Triton in the CI.

facebook-github-bot · 2024-11-19T14:44:57Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

FindHao · 2024-11-19T17:39:26Z

test/test_gpu/skip_tests_h100_pytorch.yaml

@@ -29,9 +32,6 @@ jagged_layer_norm:
 jagged_mean:
 jagged_softmax:
 jagged_sum:
-layer_norm:
-low_mem_dropout:
-rms_norm:


Liger kernels require triton package rather than pytorch-triton. I assume triton is not conflict with pytorch-triton because pytorch-triton doesn't cover import triton. I tested in local environment and it works well. but not sure if this is a safe way to do so.

We are planning to have separate tests for triton main and pytorch-triton. Our docker has two conda environments, pytorch and triton-main, so that they can be tested in the same docker.

Right now, we are only deploying tests against pytorch-triton. We will setup the triton main config as skip_tests_h100_triton_main.yaml.

FindHao

LGTM.

facebook-github-bot · 2024-11-19T18:16:39Z

@xuzhao9 merged this pull request in d9633be.

Summary: As the test isolation is implemented in #55, we can now enable more operators in the CI. Pull Request resolved: #56 Reviewed By: FindHao Differential Revision: D66189246 Pulled By: xuzhao9 fbshipit-source-id: 22f01b2e5b64956f6e2985f87be785efc977e46b

facebook-github-bot added the cla signed label Nov 18, 2024

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 19:56 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 19:57 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 21:36 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 21:41 — with GitHub Actions Error

xuzhao9 temporarily deployed to docker-s3-upload November 18, 2024 21:43 — with GitHub Actions Inactive

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 21:44 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 21:49 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 21:53 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 22:00 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 22:04 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:21 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:23 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:27 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:30 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:44 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:45 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:46 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:47 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:48 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:52 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:55 — with GitHub Actions Failure

xuzhao9 changed the title ~~Fix testing failure~~ Fix CI test failures Nov 18, 2024

xuzhao9 had a problem deploying to docker-s3-upload November 18, 2024 23:59 — with GitHub Actions Failure

xuzhao9 had a problem deploying to docker-s3-upload November 19, 2024 00:05 — with GitHub Actions Failure

xuzhao9 and others added 5 commits November 18, 2024 19:21

Print testing operators

e24e077

Another fix

6410e9e

Add gather_gemv to liger list

2e8fa17

Check impl in subprocess

0d3140f

Fix test bugs

2a85c31

xuzhao9 and others added 11 commits November 18, 2024 19:21

Fix test modes

9cec54f

Fix mode

cefa2a8

Bypass non-implemented backward test

cf4bf44

Another bugfix

4b780fa

Add main

b0d5371

Fix gemv

cd21b2d

Fix hstu

f9cc6b4

Fix patch

332001b

Fix hstu

a6da2a3

Enable more operators.

34aeb9e

Fix fp8_attention

850ec97

xuzhao9 force-pushed the xz9/skip-gather-gemv branch from 928cfae to 850ec97 Compare November 19, 2024 00:24

xuzhao9 had a problem deploying to docker-s3-upload November 19, 2024 00:25 — with GitHub Actions Failure

Fix fp8_attention

a67aa88

xuzhao9 temporarily deployed to docker-s3-upload November 19, 2024 00:37 — with GitHub Actions Inactive

xuzhao9 commented Nov 19, 2024

View reviewed changes

xuzhao9 requested review from FindHao and manman-ren November 19, 2024 00:43

FindHao reviewed Nov 19, 2024

View reviewed changes

FindHao approved these changes Nov 19, 2024

View reviewed changes

adamomainz approved these changes Nov 19, 2024

View reviewed changes

facebook-github-bot closed this in d9633be Nov 19, 2024

facebook-github-bot added the Merged label Nov 19, 2024

xuzhao9 mentioned this pull request Nov 19, 2024

Enable gemm and more operators in the CI #56

Closed

xuzhao9 deleted the xz9/skip-gather-gemv branch November 19, 2024 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CI test failures #55

Fix CI test failures #55

xuzhao9 commented Nov 18, 2024 •

edited

Loading

xuzhao9 Nov 19, 2024

manman-ren Nov 19, 2024

xuzhao9 Nov 19, 2024

facebook-github-bot commented Nov 19, 2024

FindHao Nov 19, 2024

xuzhao9 Nov 19, 2024 •

edited

Loading

FindHao left a comment

facebook-github-bot commented Nov 19, 2024

Fix CI test failures #55

Fix CI test failures #55

Conversation

xuzhao9 commented Nov 18, 2024 • edited Loading

xuzhao9 Nov 19, 2024

Choose a reason for hiding this comment

manman-ren Nov 19, 2024

Choose a reason for hiding this comment

xuzhao9 Nov 19, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Nov 19, 2024

FindHao Nov 19, 2024

Choose a reason for hiding this comment

xuzhao9 Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

FindHao left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Nov 19, 2024

xuzhao9 commented Nov 18, 2024 •

edited

Loading

xuzhao9 Nov 19, 2024 •

edited

Loading