Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multinode support in torchtune #2301

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
bbd81fd
Remove last references to from training
Jan 27, 2025
c04ebaf
Deprecate and use new function
Jan 27, 2025
e02d39b
Expose
Jan 27, 2025
c558f27
Update API docs
Jan 27, 2025
454536c
Add tests
Jan 27, 2025
78bb2ae
Merge remote-tracking branch 'upstream/main' into multi-node-support
Jan 27, 2025
66b06e1
Lint
Jan 27, 2025
0d5aeb4
Add multinode recipe and sbatch script
Jan 27, 2025
afc9c2e
Update launch commands
Jan 27, 2025
c4748a5
Move env variables around
Jan 27, 2025
94440f9
Multi-node tutorial
Jan 27, 2025
deffeca
Updates
Jan 28, 2025
f441721
Update code block
Jan 28, 2025
9ba9e24
asdf
Jan 28, 2025
b36325a
Fix linting errors
Jan 28, 2025
fc9afbd
Updates
Jan 29, 2025
373e0c0
Lint
Jan 29, 2025
4659938
Pass test
Jan 29, 2025
693b8cb
Updates to tutorial
Jan 29, 2025
3d8d73d
Remove full_finetune_multinode from recipes registry
Jan 29, 2025
c0345a5
Lint
Jan 29, 2025
a3aaeb4
Last link
Jan 29, 2025
8e20394
Merge remote-tracking branch 'upstream/main' into multi-node-support
Jan 29, 2025
427a290
Merge remote-tracking branch 'upstream/main' into multi-node-support
Jan 30, 2025
b56b6be
Evan updates
Jan 31, 2025
63205da
Merge remote-tracking branch 'upstream/main' into multi-node-support
Jan 31, 2025
76ea872
Merge remote-tracking branch 'upstream/main' into multi-node-support
Jan 31, 2025
63eb274
Update comment
Jan 31, 2025
4d027b0
Move process initialization
joecummings Jan 31, 2025
34aa18b
Move init process group to above checkpoint instantiation
joecummings Feb 1, 2025
30b7366
Update intro
joecummings Feb 1, 2025
c7fdc21
Docs r dumb
joecummings Feb 1, 2025
900d643
Wow
joecummings Feb 1, 2025
9e230ca
Rework intro
joecummings Feb 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/api_ref_training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Utilities for enabling and working with distributed training.
init_distributed
is_distributed
gather_cpu_state_dict
get_distributed_backend

.. _ac_label:

Expand Down
Loading
Loading