Install Acceleration Framework into Training Script #157

fabianlim · 2024-05-15T15:49:33Z

Description of the change

This PR installs the Acceleration Framework into the sft_trainer.py. This is a followup to #119 which proposes to have a lightweight integration; the implementation of Acceleration Framework is kept seperate in the repo fms-acceleration under the same foundation-model-stack org.

introduce a AccelerationFrameworkArguments that accepts an --acceleration_framework_config_file argument to configure the framework.
update pyproject.toml with an optional dependency [fms-accel] that installs the fms-acceleration framework.
update README.md to include basic usage of fms-acceleration, using it to perform accelerated PEFT with a 4bit GPTQ-LoRA.
ensure that the integration within sft_trainer.py is optional, the .get_framework call below will silently disable framework if fms-accel dependency is not desired.
```
framework = AccelerationFrameworkConfig.from_dataclasses(*dataclass_configs).get_framework()
```
restrict to only three integration points within sft_trainer.py script:
1. framework.model_loader: load model if framework.requires_custom_loading == True
2. framework.augmentation: load if framework.requires_agumentation == True
3. framework.get_callbacks_and_ready_for_train: get callbacks and do final prep on model and trainer.accelerator if required.

Update: Picture has been changed to reflect new dataclass arguments flow

Related issue number

Related to the merged PR #119 for the ADR of Training Acceleration.

How to verify the PR

This PR can be verified in the following ways:

run the new folder tests/acceleration, see Note-for-unit-tests below.
[Update: this bench needs to be reworked due to YAML arguments being disabled, right now works with some patching] run the provided benchmark utility, and checking with the results in the PR and the a100_80gb
- consult this helm chart as a reference to run the bench
- this helm chart also installs some visualization when running benches.

Note for unit tests
For tests/acceleration, note that testenv will not install .fms-accel dependencies, so some of the newly added tests wil be skipped. So to run all the new tests:

pip install ".[fms-accel]"
python -m fms_acceleration.cli install peft
pytest tests/acceleration

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

Ssukriti · 2024-05-31T21:15:26Z

from DM -

It would be great if there was a way to pass a Python dataclass instead of yaml file
python sft_trainer.py \ --acceleration_framework_config_file framework.yaml

we can create a new GPTQ-LOra config here https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/peft_config.py#L21
so users of fms-tuning have exactly same experience as Lora

Inside fms-tuning, we know for GPTQ-Lora we need to call AccelerationFramework() -> can we convert to a Python dataclass object accepted by AccelerationFramework -AccelerationFramework(qLORAcONFIG )

You know what dataclass instance it is so you can parse params from that

we can implement this in steps though instead of requiring too much change to your framework-
Step 1 - users of fms-tuning continue to use dataclasses -> we will have to internally convert params passed to the yaml file you want (we wont expose yaml to users of tuning, but keep it inside tuning library ) -> call accelerationframework same way with yaml
step 2- > acceleration framework adds support for Python Dataclasses for each plugin as well besides yaml , so we dont have to create yaml but can just pass params needed for a certain plugin (edited)

if we do step 1, we can merge code and get it working, while working on step 2 as enhancement

We are discussing if we need 1 dataclass or multiple, but we can start with step 1 with dataclass only in tuning and see how it goes

fixtures/accelerated-peft-autogptq-sample-configuration.yaml

tuning/sft_trainer.py

fabianlim · 2024-06-13T12:36:09Z

@Ssukriti @alex-jw-brooks I have made the requested changes:

now with dataclass parsing, it should be easy for the user to figure out what the required arguments are from inspecting the dataclasses. There is no more specifying of YAML.
see for example quantized_lora_config.
I have put unit tests in tests. It is almost complete but I might add one or two more.

Notes:

it was not that straightforward to get transformers.HfArgumentParser work with nested dataclasses. ~~I had to implement two additional utilities here. I can try to further simply this implementation.~~
- made some attempts to simplify this with a decorator parsable_dataclass
The argument parsing works as expected, you can see the tests

tuning/sft_trainer.py

Ssukriti · 2024-06-18T23:42:49Z

@fabianlim thank you for the redesign and the accelerationframework unit tests. Design looks good, so no major changes needed. Just few comments:

question above on when is GPTQLora applicable
The tests for acceleration framework check integration in detail, thank you. But I think if possible it would be beneficial to add some top level tests to test_sft_trainer as well, just to ensure
a. if quantization_lora config passed, tuning still succeeds and model after tuning can still be loaded and inferred on (like rest of tuning unit tests) . I understand you may need a quantized base model to do these unit tests? so let me know if its feasible to add or not .
b. similarly with kernels as well, if passed, tuning still succeeds.
Either way might be good to add tests for failure case as well in test_sft_trainer. What if user passes GPTQLoraconfig to an unsupported model that is not quantized - what happens then? is error given and caught
Documentation may need some more updates - but we can do that in subsequent PRs. Mainly if there is any limitation on what model types we can apply QLoRA to (needs a 4bit quantized model), that should be documented and highlighted in README

fabianlim · 2024-06-19T00:32:06Z

@Ssukriti thank you for reviewing!

since this requires augmentation and peft_config , is quantized_lora_config only expected to work with LoRA tuning and Loraconfig , or can one also apply it with fine tuning and prompt tuning?

The AccelerationFramework has logic inside its plugins to check if the peft_config is not properly set. The peft_config is passed to the framework via the augmentation step. Maybe I can have a unit test to demonstrate this.

The tests for acceleration framework check integration in detail, thank you. But I think if possible it would be beneficial to add some top level tests to test_sft_trainer as well, just to ensure
a. if quantization_lora config passed, tuning still succeeds and model after tuning can still be loaded and inferred on (like rest of tuning unit tests) . I understand you may need a quantized base model to do these unit tests? so let me know if its feasible to add or not .
b. similarly with kernels as well, if passed, tuning still succeeds.
Either way might be good to add tests for failure case as well in test_sft_trainer. What if user passes GPTQLoraconfig to an unsupported model that is not quantized - what happens then? is error given and caught

Actually I think you are referring to this kind of test. We do already have one.

I can add two much such tests, i) for fused ops and kernels, and ii) a negative test on an unsupported (non-quantized) model.

Documentation may need some more updates - but we can do that in subsequent PRs. Mainly if there is any limitation on what model types we can apply QLoRA to (needs a 4bit quantized model), that should be documented and highlighted in README

ok np!

fabianlim · 2024-06-19T08:59:09Z

@Ssukriti I have added a set of extra tests

These tests involve calling sft_trainer.train, so I think these are the tests that you are requesting for. They can be considered integration tests, to ensure correct working of the framework integrated with the trainer.

Therefore, I have added the following integration tests:

test_framework_raises_due_to_invalid_arguments: this test will demonstrate that the framework plugins will also check the arguments, and throw if invalid arguments are passed in. For example,. if we attempt to use accelerated peft and no peft_config is passed, then it throws
test_framework_intialized_properly_peft: this was refactored from an older test, but now it also demonstrates BNBQLora properly loading. Previously it was only GPTQ_LoRA happy path
test_framework_intialized_properly_foak: this one demonstrates that fused ops and kernels are also integrated properly.

Ssukriti

will approve after conflicts are merged and all checks pass. Suggested minor edits to make it clear that GPTQLora needs a peft_config.LORAconfig passed. we have to make that clear in our documentation as users of tuning will not know that.

Remaining work in subsequent PRs after this PR is merged:

we need to ensure that in CI/CD all the tests run regularly and they are not skipped. That means all dependencuies should be installed for our tests to run regularly . Purpose is to ensure with every release, all tests pass.
Unit tests - Additional unit tests added are good, thank you. I did want to ensure model after tuning after GPTQLora is of correct format , and can be loaded and inferred correctly. We have had issues in past, when something would change and model format produced is no longer correct - we should have tests to capture that to have full confidence (will DM about this)

README.md

tuning/sft_trainer.py

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim · 2024-06-20T03:42:51Z

@Ssukriti i have rebased the changes, and also created an issue here to track the remaining work items #205. ~~Working on making the tests pass~~ Update: all checks passing now.

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim requested review from anhuong, Ssukriti and alex-jw-brooks as code owners May 15, 2024 15:49

fabianlim marked this pull request as draft May 15, 2024 15:50

fabianlim force-pushed the accel-pr branch 3 times, most recently from 7b5e354 to 6095b09 Compare May 16, 2024 02:21

fabianlim changed the title ~~DO NOT REVIEW: Acceleration framework~~ Install Acceleration Framework into Training Script May 16, 2024

fabianlim marked this pull request as ready for review May 16, 2024 02:26

Ssukriti reviewed Jun 5, 2024

View reviewed changes

fixtures/accelerated-peft-autogptq-sample-configuration.yaml Outdated Show resolved Hide resolved

Ssukriti reviewed Jun 6, 2024

View reviewed changes

tuning/sft_trainer.py Show resolved Hide resolved

fabianlim force-pushed the accel-pr branch from d2d7f9b to 891cef8 Compare June 13, 2024 11:11

fabianlim force-pushed the accel-pr branch 3 times, most recently from cf23771 to 96ad8bf Compare June 13, 2024 12:48

Ssukriti reviewed Jun 18, 2024

View reviewed changes

tuning/sft_trainer.py Show resolved Hide resolved

fabianlim force-pushed the accel-pr branch from a9620e7 to 03ae17e Compare June 19, 2024 08:59

Ssukriti reviewed Jun 20, 2024

View reviewed changes

README.md Show resolved Hide resolved

README.md Show resolved Hide resolved

tuning/sft_trainer.py Show resolved Hide resolved

fabianlim added 7 commits June 20, 2024 10:16

add acceleration framework

1bb1168

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

framework can add callbacks

f157605

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

add basic acceleration framework unit tests, lint.

d5a9108

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

add README, plugin installation tool

1444535

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

updates to readme

9e5967e

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

more readme updates

53616e8

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

update fms-accel dep

8183605

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim added 14 commits June 20, 2024 10:16

add more tests

dddc02d

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fixes after rebase + linting

c4ba95c

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

make acceleration framework tests a module and lint,fmt.

d25ac75

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

clarify the usages flows

2d97556

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

replace yaml with dataclass args

0203953

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fmt + lint

2b76436

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

improve tests

97138a0

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

test fixes

6eaa50c

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

improve data parsing logic

ce630fb

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

add foak test

ce81878

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fix bug and add bnb test

fc5de83

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

add missing peft config test

9ccd0d3

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

update README as per @Ssukriti's suggestions.

06c4872

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

remove test helpers

428b4d9

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim force-pushed the accel-pr branch from 99ebbcd to 428b4d9 Compare June 20, 2024 03:37

fabianlim mentioned this pull request Jun 20, 2024

Improve Acceleration Framework Integration #205

Open

fix merge errors and other issues

3e2b7c5

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim force-pushed the accel-pr branch from ccdb2e7 to 3e2b7c5 Compare June 20, 2024 04:06

add one more check in get_framework and other fixes.

8335f1d

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim force-pushed the accel-pr branch from 8de651c to 8335f1d Compare June 20, 2024 04:29

fix tests

d22cc65

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

Ssukriti approved these changes Jun 20, 2024

View reviewed changes

Ssukriti merged commit fc8938d into foundation-model-stack:main Jun 20, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install Acceleration Framework into Training Script #157

Install Acceleration Framework into Training Script #157

fabianlim commented May 15, 2024 •

edited

Loading

Ssukriti commented May 31, 2024

fabianlim commented Jun 13, 2024 •

edited

Loading

Ssukriti commented Jun 18, 2024 •

edited

Loading

fabianlim commented Jun 19, 2024 •

edited

Loading

fabianlim commented Jun 19, 2024

Ssukriti left a comment

fabianlim commented Jun 20, 2024 •

edited

Loading

Install Acceleration Framework into Training Script #157

Install Acceleration Framework into Training Script #157

Conversation

fabianlim commented May 15, 2024 • edited Loading

Description of the change

Related issue number

How to verify the PR

Was the PR tested

Ssukriti commented May 31, 2024

fabianlim commented Jun 13, 2024 • edited Loading

Ssukriti commented Jun 18, 2024 • edited Loading

fabianlim commented Jun 19, 2024 • edited Loading

fabianlim commented Jun 19, 2024

Ssukriti left a comment

Choose a reason for hiding this comment

fabianlim commented Jun 20, 2024 • edited Loading

fabianlim commented May 15, 2024 •

edited

Loading

fabianlim commented Jun 13, 2024 •

edited

Loading

Ssukriti commented Jun 18, 2024 •

edited

Loading

fabianlim commented Jun 19, 2024 •

edited

Loading

fabianlim commented Jun 20, 2024 •

edited

Loading