Skip to content

Commit 4822fa3

Browse files
tushar00jainfacebook-github-bot
authored andcommitted
integrate torchcomms (#290)
Summary: - add torchcomms integration - instsall torchcomms in build files - add a class that wraps torchcomms to offer reconfiguration and timeout handline just like process group wrapper - allow users to pass either torchcomms or pg to manager -- manager infers the type and calls the relevant api's - had to work around torchcomms not having get_future api -- since futures are lazy in manager, we can just creae a dummy future and set the value on it since we modify the vlaue in place Differential Revision: D86343575
1 parent 854fb2d commit 4822fa3

File tree

11 files changed

+751
-26
lines changed

11 files changed

+751
-26
lines changed

.github/workflows/docs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
2626
sudo apt-get install -y protobuf-compiler
2727
28-
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
28+
pip install --pre torch torchvision torchaudio torchcomms --index-url https://download.pytorch.org/whl/nightly/cu128
2929
pip install .[dev] -v
3030
3131
pip install -r docs/requirements.txt

.github/workflows/lint.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
2424
sudo apt-get install -y protobuf-compiler
2525
26-
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
26+
pip install --pre torch torchvision torchaudio torchcomms --index-url https://download.pytorch.org/whl/nightly/cu128
2727
pip install .[dev] -v
2828
2929
# install recent version of Rust via rustup

.github/workflows/unittest-mac.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
steps:
1313
- name: Checkout
1414
uses: actions/checkout@v4
15-
15+
1616
- name: Setup miniconda
1717
uses: pytorch/test-infra/.github/actions/setup-miniconda@main
1818
with:
@@ -39,7 +39,7 @@ jobs:
3939
4040
python -m pip install --upgrade pip
4141
42-
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
42+
pip install --pre torch torchvision torchaudio torchcomms --index-url https://download.pytorch.org/whl/nightly/cpu
4343
4444
pip install -e .[dev] -v
4545

.github/workflows/unittest.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,10 @@ jobs:
4141
4242
# Optionally install torch nightly, pulls latest CUDA from pip otherwise
4343
if [ "${{ matrix.torch-version }}" = "nightly" ]; then
44-
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
44+
pip install --pre torch torchvision torchaudio torchcomms --index-url https://download.pytorch.org/whl/nightly/cu128
4545
fi
4646
if [ "${{ matrix.torch-version }}" = "test" ]; then
47-
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
47+
pip install --pre torch torchvision torchaudio torchcomms --index-url https://download.pytorch.org/whl/test/cu128
4848
fi
4949
5050
# Install dependencies

.pyre_configuration

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@
1313
},
1414
{
1515
"site-package": "parameterized"
16+
},
17+
{
18+
"site-package": "torchcomms"
1619
}
1720
]
1821
}

torchft/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
# This source code is licensed under the BSD-style license found in the
55
# LICENSE file in the root directory of this source tree.
66

7+
from torchft.comms import TorchCommGloo, TorchCommNCCL
78
from torchft.data import DistributedSampler
89
from torchft.ddp import DistributedDataParallel
910
from torchft.manager import Manager
@@ -31,4 +32,6 @@
3132
"ProcessGroupBabyNCCL",
3233
"ProcessGroupBabyXCCL",
3334
"ProcessGroupGloo",
35+
"TorchCommNCCL",
36+
"TorchCommGloo",
3437
)

0 commit comments

Comments
 (0)