Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Boolean values are represented as strings in default fsdp config translates to True #80

Open
kmehant opened this issue Mar 6, 2024 · 8 comments

Comments

@kmehant
Copy link
Collaborator

kmehant commented Mar 6, 2024

Describe the bug

Boolean values in fsdp config (https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/fsdp_config.json#L4-L6) are represented as string values. This does not translate to actual boolean values when loaded in hf/transformers library https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L1654

Solution

Use json boolean values instead of string representation. Meanwhile, I have as well raised an issue for discussion on transformers library to add a dataclass and argument parser with a motivating example (huggingface/transformers#29476)

@kmehant kmehant changed the title bug: Boolean values are represented as strings in default fsdp config defaults to True bug: Boolean values are represented as strings in default fsdp config translates to True Mar 6, 2024
@fabianlim
Copy link
Collaborator

@kmehant while the fix is as you described, now that #53 is merged, I think it may be best to switch to accelerate, which uses a yaml config defaults file. With yaml explicit encasement of strings using " and ' are not necessary, and it will be more robust to such issues. I suggest the following changes

  1. update the README.md removing instructions for torch.run and replace with accelerate.launch
  2. replace the FSDP JSON with a config yaml like this.
    • BTW I suggest to move fsdp_config.json out of tuning/config (which houses code) into somewhere which only houses config fixtures.

@Ssukriti

@Ssukriti
Copy link
Collaborator

Thanks Fabian, created issue for README updates #87 . We will prioritize it at earliest

@kmehant
Copy link
Collaborator Author

kmehant commented Mar 13, 2024

#80 (comment)

I think it may be best to switch to accelerate, which uses a yaml config defaults file.

Thanks @fabianlim, I am aware of this, isn't accelerate a wrapper over torch.distributed?

I suggest the following changes

I guess @Ssukriti is tracking them in a different issue #87

@Ssukriti
Copy link
Collaborator

Ssukriti commented Mar 13, 2024

@kmehant I was planning to get to issue #87 in next 2 days as its high priority for our deliverables, but if you are interested and want to contribute instead, feel free to do so. Just let me know so I can plan accordingly :) .
We do need it completed at earliest so we can also start some testing with multi-GPU on our end as well

@kmehant
Copy link
Collaborator Author

kmehant commented Mar 13, 2024

#80 (comment)

@Ssukriti I will be glad to raise a PR in a couple of hours.

@fabianlim
Copy link
Collaborator

@kmehant its up to you but I should be able to get to #87 pretty soon.

@kmehant
Copy link
Collaborator Author

kmehant commented Mar 13, 2024

@Ssukriti @fabianlim I have raised a PR here #92 Thanks.

@fabianlim
Copy link
Collaborator

@Ssukriti @fabianlim I have raised a PR here #92 Thanks.

@kmehant ok looks like we duplicated work, see #91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants