-
Notifications
You must be signed in to change notification settings - Fork 75
Skip certain configs in LowerCollectiveCudaAndNcclTest #5743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| TEST_P(LowerCollectiveCudaAndNcclTest, Allgather) { | ||
| const auto& [msg_size_bytes, protocol_enum] = GetParam(); | ||
| const int64_t kMsgSize = msg_size_bytes / sizeof(float); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a real constant: https://google.github.io/styleguide/cppguide.html#Constant_Names
| at::Tensor runBenchmark( | ||
| MultiDeviceExecutor& executor, | ||
| const std::vector<c10::IValue>& inputs, | ||
| int64_t msg_size_bytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| if (message_size_bytes > 32LL * 1024 * 1024) { | ||
| GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skips here
| } | ||
|
|
||
| if (message_size_bytes > 32LL * 1024 * 1024) { | ||
| GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and here
Greptile SummaryThis PR reduces CI runtime by skipping large message size tests (128MB and 256MB) in
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as Test Harness
participant Allgather as Allgather Test
participant Broadcast as Broadcast Test
participant Skip as GTEST_SKIP
Test->>Allgather: Execute with params (message_size, protocol)
Allgather->>Allgather: Check message_size_bytes > 32MB?
alt message_size > 32MB
Allgather->>Skip: Skip test (128MB, 256MB configs)
Note over Skip: "Takes >30 seconds in CI"
else message_size <= 32MB
Allgather->>Allgather: Run test (2MB, 8MB, 32MB)
end
Test->>Broadcast: Execute with params (message_size, protocol)
Broadcast->>Broadcast: Check message_size_bytes > 32MB?
alt message_size > 32MB
Broadcast->>Skip: Skip test (128MB, 256MB configs)
Note over Skip: "Takes >5 seconds in CI"
else message_size <= 32MB
Broadcast->>Broadcast: Run test (2MB, 8MB, 32MB)
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 2 comments
| } | ||
|
|
||
| if (message_size_bytes > 32LL * 1024 * 1024) { | ||
| GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: malformed URL in skip message - http://nv/e.) appears incomplete
| GTEST_SKIP() << "Takes >30 seconds to run in CI: http://nv/e.)"; | |
| GTEST_SKIP() << "Takes >30 seconds to run in CI"; |
| } | ||
|
|
||
| if (message_size_bytes > 32LL * 1024 * 1024) { | ||
| GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: malformed URL in skip message - http://nv/e.) appears incomplete
| GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)"; | |
| GTEST_SKIP() << "Takes >5 seconds to run in CI"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wujingyue greptile might be right here, this link does resolve, but it's not to a document that looks related...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closing parenthesis is actually part of the URL... I hate that; I'd rather the shortened URL contains alphanum only. But I couldn't find a way to fix nv/
Description
|
| Relevant files |
|---|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Incomplete skip reason URL
|
mdavis36
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address greptile comments about the link but LGTM
| } | ||
|
|
||
| if (message_size_bytes > 32LL * 1024 * 1024) { | ||
| GTEST_SKIP() << "Takes >5 seconds to run in CI: http://nv/e.)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wujingyue greptile might be right here, this link does resolve, but it's not to a document that looks related...
|
!test |
... to speed up CI and local runs
The way forward could be to reduce
warmup_itersandtiming_itersand/or make this a benchmark (e.g. #5753) that doesn't run by default.