-
Notifications
You must be signed in to change notification settings - Fork 74
Add a toy multi-GPU benchmark #5753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -7,7 +7,14 @@ | |||
| // clang-format on | ||||
| #include <sys/types.h> | ||||
| #include <unistd.h> | ||||
| #include <mutex> | ||||
|
|
||||
| #include <sstream> | ||||
| #include <string> | ||||
| #include <string_view> | ||||
| #include <vector> | ||||
|
|
||||
| #include <benchmark/benchmark.h> | ||||
| #include <gtest/gtest.h> | ||||
|
|
||||
| #ifdef NVFUSER_DISTRIBUTED | ||||
| #include <torch/csrc/distributed/c10d/debug.h> | ||||
|
|
@@ -33,7 +40,7 @@ void MultiDeviceTestEnvironment::TearDown() { | |||
| Communicator::getInstance().cleanup(); | ||||
| } | ||||
|
|
||||
| MultiDeviceTest::MultiDeviceTest() { | ||||
| MultiDeviceFixture::MultiDeviceFixture() { | ||||
| // Enable logging in c10d so debug messages can be printed out via | ||||
| // `TORCH_DISTRIBUTED_DEBUG`. | ||||
| c10d::setDebugLevelFromEnvironment(); | ||||
|
|
@@ -42,6 +49,9 @@ MultiDeviceTest::MultiDeviceTest() { | |||
| tensor_options_ = | ||||
| at::TensorOptions().dtype(at::kFloat).device(communicator_->device()); | ||||
| debug_print = getNvFuserEnv("MULTIDEVICE_DEBUG_PRINT") != nullptr; | ||||
| } | ||||
|
|
||||
| MultiDeviceTest::MultiDeviceTest() { | ||||
| disable_skip = getNvFuserEnv("MULTIDEVICE_DISABLE_SKIP") != nullptr; | ||||
| } | ||||
|
|
||||
|
|
@@ -55,16 +65,24 @@ MultiDeviceTest::~MultiDeviceTest() { | |||
| } | ||||
| } | ||||
|
|
||||
| void MultiDeviceBenchmark::TearDown(benchmark::State& state) { | ||||
| // Unlike testing::Test, a benchmark::Fixture is destructed after `main` | ||||
| // exits, not after each benchmark. Therefore, we have to put barrier in | ||||
| // TearDown instead of the destructor. | ||||
| if (communicator_->is_available()) { | ||||
| communicator_->barrier(); | ||||
| } | ||||
| } | ||||
|
|
||||
| void MultiDeviceTest::SetUp() { | ||||
| // Set the same random seed for all processes. | ||||
| NVFuserTest::SetUp(); | ||||
|
|
||||
| if (!disable_skip && !communicator_->is_available()) { | ||||
| GTEST_SKIP() << "This test needs an available communicator."; | ||||
| } | ||||
| } | ||||
|
|
||||
| at::Tensor MultiDeviceTest::shardTensor(at::Tensor tensor, TensorView* tv) { | ||||
| at::Tensor MultiDeviceFixture::shardTensor(at::Tensor tensor, TensorView* tv) { | ||||
| if (!isSharded(tv)) { | ||||
| return tensor; | ||||
| } | ||||
|
|
@@ -75,7 +93,7 @@ at::Tensor MultiDeviceTest::shardTensor(at::Tensor tensor, TensorView* tv) { | |||
| tv->getDeviceMesh()); | ||||
| } | ||||
|
|
||||
| at::Tensor MultiDeviceTest::shardTensor( | ||||
| at::Tensor MultiDeviceFixture::shardTensor( | ||||
| at::Tensor tensor, | ||||
| const int64_t axis, | ||||
| const DeviceMesh& mesh) { | ||||
|
|
@@ -162,8 +180,27 @@ void MultiDeviceTest::validate( | |||
|
|
||||
| } // namespace nvfuser | ||||
|
|
||||
| namespace { | ||||
| bool wantsBenchmarks(int argc, char** argv) { | ||||
| for (int i = 1; i < argc; ++i) { | ||||
| std::string_view a(argv[i]); | ||||
| if (a.starts_with("--benchmark")) | ||||
| return true; | ||||
| } | ||||
| return false; | ||||
| } | ||||
| } // namespace | ||||
|
|
||||
| int main(int argc, char** argv) { | ||||
| testing::InitGoogleTest(&argc, argv); | ||||
| testing::AddGlobalTestEnvironment(new nvfuser::MultiDeviceTestEnvironment()); | ||||
|
|
||||
| if (wantsBenchmarks(argc, argv)) { | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that we only run one of validation or benchmarking?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that has been a Google internal convention -- when the user specifies
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I suspect we are talking about different things. Nothing prevents a BENCHMARK_DEFINE_F from using comparison macros like EXPECT_EQ. That'll make a BENCHMARK_DEFINE_F on par with the runBenchmark function you pointed to. I'm asking whether a benchmark binary (e.g.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it.
I think we should either run tests or benchmarks. Benchmarks can additionally validate the results, as you mentioned. In this case, my preference would be to link them to different binaries. Test binaries only run tests and benchmark binaries only run benchmarks. This behavior sounds the most predictable to me. |
||||
| benchmark::Initialize(&argc, argv); | ||||
| benchmark::RunSpecifiedBenchmarks(); | ||||
| benchmark::Shutdown(); | ||||
| return 0; | ||||
| } | ||||
|
|
||||
| return RUN_ALL_TESTS(); | ||||
| } | ||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks tend to run longer and don't need to run as frequently as tests, so it's worth separating benchmarks from (correctness) tests.
The question though is how.
--benchmarks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option (1) might be simplest to use in the short term. Instead of 2 different commands, only an additional flag is needed. The downside is that tests and benchmarks do not have a clear distinction.
Option (2) is a good balance to reuse while maintaining different binaries but requires different commands for the validation and benchmarking part.
For option (3), we could define common fusions in a path outside tests/benchmarks, however the setup will still likely be repeated. Another downside I see is that there are multiple locations which need to be kept in sync.
Yet another option is to have these in the benchmark file with validation, and allow arguments to disable either. The github CI can only run validation whereas nightly CI runs everything.
For now what you have in the PR looks like a good starting point to atleast unify how we create benchmarks. I am assuming you intend to modify
Fuser/tests/cpp/test_multidevice_lower_communication_cuda.cpp
Line 80 in d395676
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that'll likely be the first target.