Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INT-1] On-off ramp global pg #29

Merged
merged 17 commits into from
Oct 2, 2024
Merged

[INT-1] On-off ramp global pg #29

merged 17 commits into from
Oct 2, 2024

Conversation

Jackmin801
Copy link
Member

No description provided.

@Jackmin801 Jackmin801 changed the title Live ckpt recovery [Feat] Live ckpt recovery Oct 1, 2024
@Jackmin801 Jackmin801 changed the title [Feat] Live ckpt recovery [INT-1] Live ckpt recovery Oct 1, 2024
@Jackmin801 Jackmin801 changed the title [INT-1] Live ckpt recovery [INT-1] On-off ramp global pg Oct 2, 2024
@Jackmin801 Jackmin801 marked this pull request as ready for review October 2, 2024 07:50
@Jackmin801 Jackmin801 requested a review from samsja October 2, 2024 07:50
@Jackmin801
Copy link
Member Author

@@ -37,6 +37,17 @@ def get_random_available_port():
return s.getsockname()[1]


@pytest.fixture()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might have confclit here on the port

Copy link
Member

@samsja samsja Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the get_random_available_port_list in test_train.py instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I couldnt get this to work reliably so I just hardcoded some high port numbers instead. ill just drop this function and put a todo? or u think the hardcoded ports are fine and we just make sure this test is not multiprocessed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure

import multiprocessing as mp


@pytest.mark.parametrize("world_size", [2, 8])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize("world_size", [2, 8])
@pytest.mark.parametrize("world_size", [2, 4])

lets keep the test runnable on 4 gpus

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah the tests run in cpu right now. is that ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense !

@Jackmin801 Jackmin801 requested a review from samsja October 2, 2024 19:16
Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFGTM 🔥

@samsja samsja merged commit 5ddd12d into main Oct 2, 2024
1 check passed
samsja pushed a commit that referenced this pull request Jan 10, 2025
…setup (#29)

Summary:
Fixes facebookresearch/optimizers#28. Also, it adds `torch` and `torchvision` as requirements and bumps the required Python version to `>=3.10`.

**Question:** Is a more recent version than `torch==2.0.0` strictly required for anything?

Pull Request resolved: facebookresearch/optimizers#29

Reviewed By: anana10c

Differential Revision: D64612120

Pulled By: tsunghsienlee

fbshipit-source-id: 9953c51daf58e49afe7ab18c3bd59706484c1823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants