Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOMptimizer: bucketing batch size profiles to make GPUs go 🔥 #9763

Merged
merged 70 commits into from
Aug 16, 2024

Conversation

pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented Jul 17, 2024

What does this PR do ?

Major contributions:

  • Canary-1B can be trained with 5x larger batch sizes compared to our earlier baseline. It maxes out GPU utilization (memory, compute, and power consumption wise). As a result the mean training step time is 2.75x longer, resulting in a training throughput of 5x / 2.75x ~= 180% of the original recipe. I managed to reproduce Canary-1B in about 40k training steps on the same number of GPUs, changing only bucketing/batch size settings using new features in this PR.
    • Update: actually reproduces Canary-1B in half of the training time with slightly improved WERs.
    • Update 2: also reproduced Canary-1B in the original training time using 4x less GPUs.
  • Note: these tools are applicable to all ASR models and can easily be made applicable to any (audio|text)->(audio|text) model.
  • OOMptimizer script that given a model config and bucket bins, finds the optimal batch sizes for each bucket bin. Optimal = maximum GPU utilization.
  • 2D bucketing with a dedicated estimation script and dataloading support. Allows to stratify sampling by input and output sequence lengths, resulting in improved training throughput for encoder-decoder models.
  • Ability to filter out examples exceeding a tokens-per-second threshold during training (e.g. some datasets have very severe outliers, 20x more tokens than median).
  • Enables concurrent bucketing, speeding up the start of the training loop (in my experiments reduces the wait time from ~2min to ~20 seconds). With this setting, the bucketing buffer is filled asynchronously and lets the sampler draw batches when it's at least 10% filled, so the training can start faster.
  • Documentation and examples of usage of the new features.
  • Unit and integration for the new features.

Collection: ASR

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@pzelasko pzelasko requested review from titu1994 and galv July 17, 2024 15:14
titu1994
titu1994 previously approved these changes Jul 17, 2024
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting approach to maximize the batch size, minor comments but it looks good.


@property
def max_batch_size(self) -> int | None:
if (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a bit of doc for all the cases

return self._max_ok
return None

@property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does relative gap mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc to explain

return False


class FloatList(click.Option):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this used for ? Might as well use hydra with a dataclass than click

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I went with click out of an old habit. This auto-parses bucket duration bins [1,2,3,4] to list of floats.


print("Intializing ASR model.")
# TODO(pzelasko): This currently only supports "from_pretrained".
# We need to be able to read a model training configuration and instantiate the model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use restore_from(..., return_config=True)...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up going with --module-name and --config-path like discussed offline. It works well.

@pzelasko
Copy link
Collaborator Author

I realized we can make a post-processing pass on the max_batch_size list and merge buckets with identical batch sizes. Merging buckets will improve randomization. If the original num_buckets was large enough to trigger merging, that approach will lead us to an optimal number of buckets.

oom = False
try:
print(f"Current gap: {gen.current_rel_gap}. Attempting shapes: {[b.shape for b in batch]}", end=" ")
optimizer.zero_grad()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is theoretically possible to do these three lines in a cuda stream capture with "relaxed" mode to avoid doing any sort of GPU-side computation. However, it will work only for code that has no data-dependent shapes (like torch.nonzero). Note that I haven't run your code and don't know how slow it is right now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is surprisingly fast - for ~30 buckets the total runtime seems within 1-2 minutes. If CUDA graph "relaxed" mode would be "ok" with skipping NCCL ops then we might even incorporate this as a training time calibration (which we can't do now because these steps trigger NCCL syncs, if one GPU dies and other doesn't, it would hang). But even as-is I think this is a viable approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For lots of buckets (i.e. 100+) it takes a while. We should try the "relaxed" CUDA graph trick, and if it works, make a follow up PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious how long "a while" is.

The relaxed cuda graph trick definitely won't always work unfortunately... I spoke with someone who works on end-to-end training and he told me that there is a cudaStreamSynchronize() is the torch.amp.GradScaler, which will prevent using relaxed stream capture for models that do gradient scaling in mixed precision training.

Copy link
Collaborator Author

@pzelasko pzelasko Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's around 15 minutes for 150 buckets.

pzelasko and others added 3 commits July 18, 2024 15:50
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
@github-actions github-actions bot added the ASR label Jul 19, 2024
pzelasko added 11 commits July 22, 2024 13:06
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
/home/TestData/asr_tokenizers/canary/es/tokenizer_spe_bpe_v1024_max_4/tokenizer.model \
--langs spl_tokens en es \
--prompt-format canary \
--prompt '[{"role":"user","slots":{"source_lang":"en","target_lang":"en","task":"asr","pnc":"yes"}}]' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--prompt '[{"role":"user","slots":{"source_lang":"en","target_lang":"en","task":"asr","pnc":"yes"}}]' \
--prompt \'[{"role":"user","slots":{"source_lang":"en","target_lang":"en","task":"asr","pnc":"yes"}}]\' \

I think that should do the trick @pzelasko

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks... I was seriously scratching my head with this one lol.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also resulted with an error. I am disabling the check so this PR may go in. If we can figure out how to work around the quoting issue, I will enable the check in a follow up PR.

Copy link
Collaborator

@ko3n1g ko3n1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hope you don't mind my comments!

.github/workflows/cicd-main.yml Outdated Show resolved Hide resolved
.github/workflows/cicd-main.yml Outdated Show resolved Hide resolved
.github/workflows/cicd-main.yml Outdated Show resolved Hide resolved
Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
galv
galv previously approved these changes Aug 15, 2024
Copy link
Collaborator

@galv galv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to hit approve last time.

Copy link
Collaborator

@ko3n1g ko3n1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will merge this after pipeline pass

@pzelasko pzelasko merged commit a2c1627 into main Aug 16, 2024
128 of 129 checks passed
@pzelasko pzelasko deleted the oomptimizer branch August 16, 2024 11:59
BoxiangW pushed a commit to BoxiangW/NeMo that referenced this pull request Aug 16, 2024
…9763)

* Initial working draft of the OOMptimizer.

Signed-off-by: Piotr Żelasko <[email protected]>

* Support model config. Add bucket merging.

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* code review

Signed-off-by: Piotr Żelasko <[email protected]>

* Support bucket_batch_size option for lhotse dataloading

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix for autocast and configurable dtype

Signed-off-by: Piotr Żelasko <[email protected]>

* Allow token-per-second filtering

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix an issue with canary tokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Lift the requirement to use CanaryTokenizer with canary prompt format

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Initial 2D bucketing draft

Signed-off-by: Piotr Żelasko <[email protected]>

* Separate script for 2D bucket estimation

Signed-off-by: Piotr Żelasko <[email protected]>

* Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit tests for bucket_batch_size and 2D bucketing for audio

Signed-off-by: Piotr Żelasko <[email protected]>

* Docs for 2D estimate duration bins

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Preliminary support for prompt format in estimate_duration_bins_2d

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix for bucket selection edge case

Signed-off-by: Piotr Żelasko <[email protected]>

* Add more info about the distribution to estimate_duration_bins_2d.py

Signed-off-by: Piotr Żelasko <[email protected]>

* Include CUDA RAM usage tracking in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Track batch_size, num frames/tokens, and their padding ratio for AED multi task models

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer documentation

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities

Signed-off-by: Piotr Żelasko <[email protected]>

* Add missing property decorator

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Add docs about 2D bucketing with tokenizer and prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix bucket allocation logic for 2D bucketing

Signed-off-by: Piotr Żelasko <[email protected]>

* Bump lhotse version

Signed-off-by: Piotr Żelasko <[email protected]>

* fix...

Signed-off-by: Piotr Żelasko <[email protected]>

* Reverse bucket iteration order; move oomptimizer_schema to AsrModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Make OOMptimizer compatible with dataclass mini-batches

Signed-off-by: Piotr Żelasko <[email protected]>

* Refine the schema

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes after merging main

Signed-off-by: Piotr Żelasko <[email protected]>

* fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable concurrent bucketing to prevent spawning extra threads in tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests and make life more colorful

Signed-off-by: Piotr Żelasko <[email protected]>

* formatting

Signed-off-by: Piotr Żelasko <[email protected]>

* more reasonable starting batch size settings

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable clearing of cuda memory cache

Signed-off-by: Piotr Żelasko <[email protected]>

* Even more conservative profile by incorporating DDP overhead simulation

Signed-off-by: Piotr Żelasko <[email protected]>

* Bucket selection fix and an extended unit test

* Refactor registered_prompt_format_fn to enable prompt formatting before Sampler

Signed-off-by: Piotr Żelasko <[email protected]>

* porting fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes, move fast-path to prompted dataset

Signed-off-by: Piotr Żelasko <[email protected]>

* Changes from Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer tests + fixes for 1D bucketing case

Signed-off-by: Piotr Żelasko <[email protected]>

* estimate duration bins tests

Signed-off-by: Piotr Żelasko <[email protected]>

* address Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CPU unit test

Signed-off-by: Piotr Żelasko <[email protected]>

* try to fix CI test

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review

Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Disable 2D bucketing test with prompt due to quoting issue

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: oliver könig <[email protected]>
BoxiangW pushed a commit to BoxiangW/NeMo that referenced this pull request Aug 19, 2024
…9763)

* Initial working draft of the OOMptimizer.

Signed-off-by: Piotr Żelasko <[email protected]>

* Support model config. Add bucket merging.

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* code review

Signed-off-by: Piotr Żelasko <[email protected]>

* Support bucket_batch_size option for lhotse dataloading

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix for autocast and configurable dtype

Signed-off-by: Piotr Żelasko <[email protected]>

* Allow token-per-second filtering

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix an issue with canary tokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Lift the requirement to use CanaryTokenizer with canary prompt format

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Initial 2D bucketing draft

Signed-off-by: Piotr Żelasko <[email protected]>

* Separate script for 2D bucket estimation

Signed-off-by: Piotr Żelasko <[email protected]>

* Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit tests for bucket_batch_size and 2D bucketing for audio

Signed-off-by: Piotr Żelasko <[email protected]>

* Docs for 2D estimate duration bins

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Preliminary support for prompt format in estimate_duration_bins_2d

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix for bucket selection edge case

Signed-off-by: Piotr Żelasko <[email protected]>

* Add more info about the distribution to estimate_duration_bins_2d.py

Signed-off-by: Piotr Żelasko <[email protected]>

* Include CUDA RAM usage tracking in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Track batch_size, num frames/tokens, and their padding ratio for AED multi task models

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer documentation

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities

Signed-off-by: Piotr Żelasko <[email protected]>

* Add missing property decorator

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Add docs about 2D bucketing with tokenizer and prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix bucket allocation logic for 2D bucketing

Signed-off-by: Piotr Żelasko <[email protected]>

* Bump lhotse version

Signed-off-by: Piotr Żelasko <[email protected]>

* fix...

Signed-off-by: Piotr Żelasko <[email protected]>

* Reverse bucket iteration order; move oomptimizer_schema to AsrModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Make OOMptimizer compatible with dataclass mini-batches

Signed-off-by: Piotr Żelasko <[email protected]>

* Refine the schema

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes after merging main

Signed-off-by: Piotr Żelasko <[email protected]>

* fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable concurrent bucketing to prevent spawning extra threads in tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests and make life more colorful

Signed-off-by: Piotr Żelasko <[email protected]>

* formatting

Signed-off-by: Piotr Żelasko <[email protected]>

* more reasonable starting batch size settings

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable clearing of cuda memory cache

Signed-off-by: Piotr Żelasko <[email protected]>

* Even more conservative profile by incorporating DDP overhead simulation

Signed-off-by: Piotr Żelasko <[email protected]>

* Bucket selection fix and an extended unit test

* Refactor registered_prompt_format_fn to enable prompt formatting before Sampler

Signed-off-by: Piotr Żelasko <[email protected]>

* porting fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes, move fast-path to prompted dataset

Signed-off-by: Piotr Żelasko <[email protected]>

* Changes from Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer tests + fixes for 1D bucketing case

Signed-off-by: Piotr Żelasko <[email protected]>

* estimate duration bins tests

Signed-off-by: Piotr Żelasko <[email protected]>

* address Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CPU unit test

Signed-off-by: Piotr Żelasko <[email protected]>

* try to fix CI test

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review

Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Disable 2D bucketing test with prompt due to quoting issue

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Dido0o0 pushed a commit to Dido0o0/NeMo that referenced this pull request Aug 23, 2024
…9763)

* Initial working draft of the OOMptimizer.

Signed-off-by: Piotr Żelasko <[email protected]>

* Support model config. Add bucket merging.

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* code review

Signed-off-by: Piotr Żelasko <[email protected]>

* Support bucket_batch_size option for lhotse dataloading

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix for autocast and configurable dtype

Signed-off-by: Piotr Żelasko <[email protected]>

* Allow token-per-second filtering

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix an issue with canary tokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Lift the requirement to use CanaryTokenizer with canary prompt format

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Initial 2D bucketing draft

Signed-off-by: Piotr Żelasko <[email protected]>

* Separate script for 2D bucket estimation

Signed-off-by: Piotr Żelasko <[email protected]>

* Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit tests for bucket_batch_size and 2D bucketing for audio

Signed-off-by: Piotr Żelasko <[email protected]>

* Docs for 2D estimate duration bins

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Preliminary support for prompt format in estimate_duration_bins_2d

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix for bucket selection edge case

Signed-off-by: Piotr Żelasko <[email protected]>

* Add more info about the distribution to estimate_duration_bins_2d.py

Signed-off-by: Piotr Żelasko <[email protected]>

* Include CUDA RAM usage tracking in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Track batch_size, num frames/tokens, and their padding ratio for AED multi task models

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer documentation

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities

Signed-off-by: Piotr Żelasko <[email protected]>

* Add missing property decorator

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Add docs about 2D bucketing with tokenizer and prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix bucket allocation logic for 2D bucketing

Signed-off-by: Piotr Żelasko <[email protected]>

* Bump lhotse version

Signed-off-by: Piotr Żelasko <[email protected]>

* fix...

Signed-off-by: Piotr Żelasko <[email protected]>

* Reverse bucket iteration order; move oomptimizer_schema to AsrModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Make OOMptimizer compatible with dataclass mini-batches

Signed-off-by: Piotr Żelasko <[email protected]>

* Refine the schema

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes after merging main

Signed-off-by: Piotr Żelasko <[email protected]>

* fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable concurrent bucketing to prevent spawning extra threads in tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests and make life more colorful

Signed-off-by: Piotr Żelasko <[email protected]>

* formatting

Signed-off-by: Piotr Żelasko <[email protected]>

* more reasonable starting batch size settings

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable clearing of cuda memory cache

Signed-off-by: Piotr Żelasko <[email protected]>

* Even more conservative profile by incorporating DDP overhead simulation

Signed-off-by: Piotr Żelasko <[email protected]>

* Bucket selection fix and an extended unit test

* Refactor registered_prompt_format_fn to enable prompt formatting before Sampler

Signed-off-by: Piotr Żelasko <[email protected]>

* porting fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes, move fast-path to prompted dataset

Signed-off-by: Piotr Żelasko <[email protected]>

* Changes from Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer tests + fixes for 1D bucketing case

Signed-off-by: Piotr Żelasko <[email protected]>

* estimate duration bins tests

Signed-off-by: Piotr Żelasko <[email protected]>

* address Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CPU unit test

Signed-off-by: Piotr Żelasko <[email protected]>

* try to fix CI test

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review

Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Disable 2D bucketing test with prompt due to quoting issue

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: oliver könig <[email protected]>
adityavavre pushed a commit to adityavavre/NeMo that referenced this pull request Sep 15, 2024
…9763)

* Initial working draft of the OOMptimizer.

Signed-off-by: Piotr Żelasko <[email protected]>

* Support model config. Add bucket merging.

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* code review

Signed-off-by: Piotr Żelasko <[email protected]>

* Support bucket_batch_size option for lhotse dataloading

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix for autocast and configurable dtype

Signed-off-by: Piotr Żelasko <[email protected]>

* Allow token-per-second filtering

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix an issue with canary tokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Lift the requirement to use CanaryTokenizer with canary prompt format

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Initial 2D bucketing draft

Signed-off-by: Piotr Żelasko <[email protected]>

* Separate script for 2D bucket estimation

Signed-off-by: Piotr Żelasko <[email protected]>

* Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit tests for bucket_batch_size and 2D bucketing for audio

Signed-off-by: Piotr Żelasko <[email protected]>

* Docs for 2D estimate duration bins

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Preliminary support for prompt format in estimate_duration_bins_2d

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix for bucket selection edge case

Signed-off-by: Piotr Żelasko <[email protected]>

* Add more info about the distribution to estimate_duration_bins_2d.py

Signed-off-by: Piotr Żelasko <[email protected]>

* Include CUDA RAM usage tracking in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Track batch_size, num frames/tokens, and their padding ratio for AED multi task models

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer documentation

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities

Signed-off-by: Piotr Żelasko <[email protected]>

* Add missing property decorator

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Add docs about 2D bucketing with tokenizer and prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix bucket allocation logic for 2D bucketing

Signed-off-by: Piotr Żelasko <[email protected]>

* Bump lhotse version

Signed-off-by: Piotr Żelasko <[email protected]>

* fix...

Signed-off-by: Piotr Żelasko <[email protected]>

* Reverse bucket iteration order; move oomptimizer_schema to AsrModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Make OOMptimizer compatible with dataclass mini-batches

Signed-off-by: Piotr Żelasko <[email protected]>

* Refine the schema

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes after merging main

Signed-off-by: Piotr Żelasko <[email protected]>

* fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable concurrent bucketing to prevent spawning extra threads in tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests and make life more colorful

Signed-off-by: Piotr Żelasko <[email protected]>

* formatting

Signed-off-by: Piotr Żelasko <[email protected]>

* more reasonable starting batch size settings

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable clearing of cuda memory cache

Signed-off-by: Piotr Żelasko <[email protected]>

* Even more conservative profile by incorporating DDP overhead simulation

Signed-off-by: Piotr Żelasko <[email protected]>

* Bucket selection fix and an extended unit test

* Refactor registered_prompt_format_fn to enable prompt formatting before Sampler

Signed-off-by: Piotr Żelasko <[email protected]>

* porting fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes, move fast-path to prompted dataset

Signed-off-by: Piotr Żelasko <[email protected]>

* Changes from Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer tests + fixes for 1D bucketing case

Signed-off-by: Piotr Żelasko <[email protected]>

* estimate duration bins tests

Signed-off-by: Piotr Żelasko <[email protected]>

* address Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CPU unit test

Signed-off-by: Piotr Żelasko <[email protected]>

* try to fix CI test

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review

Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Disable 2D bucketing test with prompt due to quoting issue

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Signed-off-by: adityavavre <[email protected]>
monica-sekoyan pushed a commit that referenced this pull request Oct 14, 2024
* Initial working draft of the OOMptimizer.

Signed-off-by: Piotr Żelasko <[email protected]>

* Support model config. Add bucket merging.

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* code review

Signed-off-by: Piotr Żelasko <[email protected]>

* Support bucket_batch_size option for lhotse dataloading

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix for autocast and configurable dtype

Signed-off-by: Piotr Żelasko <[email protected]>

* Allow token-per-second filtering

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix an issue with canary tokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Lift the requirement to use CanaryTokenizer with canary prompt format

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Initial 2D bucketing draft

Signed-off-by: Piotr Żelasko <[email protected]>

* Separate script for 2D bucket estimation

Signed-off-by: Piotr Żelasko <[email protected]>

* Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit tests for bucket_batch_size and 2D bucketing for audio

Signed-off-by: Piotr Żelasko <[email protected]>

* Docs for 2D estimate duration bins

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Preliminary support for prompt format in estimate_duration_bins_2d

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix for bucket selection edge case

Signed-off-by: Piotr Żelasko <[email protected]>

* Add more info about the distribution to estimate_duration_bins_2d.py

Signed-off-by: Piotr Żelasko <[email protected]>

* Include CUDA RAM usage tracking in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Track batch_size, num frames/tokens, and their padding ratio for AED multi task models

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer documentation

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities

Signed-off-by: Piotr Żelasko <[email protected]>

* Add missing property decorator

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Add docs about 2D bucketing with tokenizer and prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix bucket allocation logic for 2D bucketing

Signed-off-by: Piotr Żelasko <[email protected]>

* Bump lhotse version

Signed-off-by: Piotr Żelasko <[email protected]>

* fix...

Signed-off-by: Piotr Żelasko <[email protected]>

* Reverse bucket iteration order; move oomptimizer_schema to AsrModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Make OOMptimizer compatible with dataclass mini-batches

Signed-off-by: Piotr Żelasko <[email protected]>

* Refine the schema

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes after merging main

Signed-off-by: Piotr Żelasko <[email protected]>

* fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable concurrent bucketing to prevent spawning extra threads in tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests and make life more colorful

Signed-off-by: Piotr Żelasko <[email protected]>

* formatting

Signed-off-by: Piotr Żelasko <[email protected]>

* more reasonable starting batch size settings

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable clearing of cuda memory cache

Signed-off-by: Piotr Żelasko <[email protected]>

* Even more conservative profile by incorporating DDP overhead simulation

Signed-off-by: Piotr Żelasko <[email protected]>

* Bucket selection fix and an extended unit test

* Refactor registered_prompt_format_fn to enable prompt formatting before Sampler

Signed-off-by: Piotr Żelasko <[email protected]>

* porting fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes, move fast-path to prompted dataset

Signed-off-by: Piotr Żelasko <[email protected]>

* Changes from Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer tests + fixes for 1D bucketing case

Signed-off-by: Piotr Żelasko <[email protected]>

* estimate duration bins tests

Signed-off-by: Piotr Żelasko <[email protected]>

* address Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CPU unit test

Signed-off-by: Piotr Żelasko <[email protected]>

* try to fix CI test

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review

Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Disable 2D bucketing test with prompt due to quoting issue

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: oliver könig <[email protected]>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
…9763)

* Initial working draft of the OOMptimizer.

Signed-off-by: Piotr Żelasko <[email protected]>

* Support model config. Add bucket merging.

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* code review

Signed-off-by: Piotr Żelasko <[email protected]>

* Support bucket_batch_size option for lhotse dataloading

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Ability to force a memory fraction to be unused in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix for autocast and configurable dtype

Signed-off-by: Piotr Żelasko <[email protected]>

* Allow token-per-second filtering

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix an issue with canary tokenizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Lift the requirement to use CanaryTokenizer with canary prompt format

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Initial 2D bucketing draft

Signed-off-by: Piotr Żelasko <[email protected]>

* Separate script for 2D bucket estimation

Signed-off-by: Piotr Żelasko <[email protected]>

* Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit tests for bucket_batch_size and 2D bucketing for audio

Signed-off-by: Piotr Żelasko <[email protected]>

* Docs for 2D estimate duration bins

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Preliminary support for prompt format in estimate_duration_bins_2d

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix for bucket selection edge case

Signed-off-by: Piotr Żelasko <[email protected]>

* Add more info about the distribution to estimate_duration_bins_2d.py

Signed-off-by: Piotr Żelasko <[email protected]>

* Include CUDA RAM usage tracking in OOMptimizer

Signed-off-by: Piotr Żelasko <[email protected]>

* Track batch_size, num frames/tokens, and their padding ratio for AED multi task models

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer documentation

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities

Signed-off-by: Piotr Żelasko <[email protected]>

* Add missing property decorator

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* Add docs about 2D bucketing with tokenizer and prompts

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix bucket allocation logic for 2D bucketing

Signed-off-by: Piotr Żelasko <[email protected]>

* Bump lhotse version

Signed-off-by: Piotr Żelasko <[email protected]>

* fix...

Signed-off-by: Piotr Żelasko <[email protected]>

* Reverse bucket iteration order; move oomptimizer_schema to AsrModel

Signed-off-by: Piotr Żelasko <[email protected]>

* Make OOMptimizer compatible with dataclass mini-batches

Signed-off-by: Piotr Żelasko <[email protected]>

* Refine the schema

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes after merging main

Signed-off-by: Piotr Żelasko <[email protected]>

* fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable concurrent bucketing to prevent spawning extra threads in tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests and make life more colorful

Signed-off-by: Piotr Żelasko <[email protected]>

* formatting

Signed-off-by: Piotr Żelasko <[email protected]>

* more reasonable starting batch size settings

Signed-off-by: Piotr Żelasko <[email protected]>

* Disable clearing of cuda memory cache

Signed-off-by: Piotr Żelasko <[email protected]>

* Even more conservative profile by incorporating DDP overhead simulation

Signed-off-by: Piotr Żelasko <[email protected]>

* Bucket selection fix and an extended unit test

* Refactor registered_prompt_format_fn to enable prompt formatting before Sampler

Signed-off-by: Piotr Żelasko <[email protected]>

* porting fix

Signed-off-by: Piotr Żelasko <[email protected]>

* Fixes, move fast-path to prompted dataset

Signed-off-by: Piotr Żelasko <[email protected]>

* Changes from Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* OOMptimizer tests + fixes for 1D bucketing case

Signed-off-by: Piotr Żelasko <[email protected]>

* estimate duration bins tests

Signed-off-by: Piotr Żelasko <[email protected]>

* address Daniel's review

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CPU unit test

Signed-off-by: Piotr Żelasko <[email protected]>

* try to fix CI test

Signed-off-by: Piotr Żelasko <[email protected]>

* Apply suggestions from code review

Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>

* Disable 2D bucketing test with prompt due to quoting issue

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants