-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOMptimizer: bucketing batch size profiles to make GPUs go 🔥 #9763
Conversation
Signed-off-by: Piotr Żelasko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting approach to maximize the batch size, minor comments but it looks good.
|
||
@property | ||
def max_batch_size(self) -> int | None: | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a bit of doc for all the cases
return self._max_ok | ||
return None | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does relative gap mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added doc to explain
return False | ||
|
||
|
||
class FloatList(click.Option): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this used for ? Might as well use hydra with a dataclass than click
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I went with click out of an old habit. This auto-parses bucket duration bins [1,2,3,4]
to list of floats.
|
||
print("Intializing ASR model.") | ||
# TODO(pzelasko): This currently only supports "from_pretrained". | ||
# We need to be able to read a model training configuration and instantiate the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use restore_from(..., return_config=True)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up going with --module-name
and --config-path
like discussed offline. It works well.
I realized we can make a post-processing pass on the max_batch_size list and merge buckets with identical batch sizes. Merging buckets will improve randomization. If the original num_buckets was large enough to trigger merging, that approach will lead us to an optimal number of buckets. |
oom = False | ||
try: | ||
print(f"Current gap: {gen.current_rel_gap}. Attempting shapes: {[b.shape for b in batch]}", end=" ") | ||
optimizer.zero_grad() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is theoretically possible to do these three lines in a cuda stream capture with "relaxed" mode to avoid doing any sort of GPU-side computation. However, it will work only for code that has no data-dependent shapes (like torch.nonzero). Note that I haven't run your code and don't know how slow it is right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is surprisingly fast - for ~30 buckets the total runtime seems within 1-2 minutes. If CUDA graph "relaxed" mode would be "ok" with skipping NCCL ops then we might even incorporate this as a training time calibration (which we can't do now because these steps trigger NCCL syncs, if one GPU dies and other doesn't, it would hang). But even as-is I think this is a viable approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For lots of buckets (i.e. 100+) it takes a while. We should try the "relaxed" CUDA graph trick, and if it works, make a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious how long "a while" is.
The relaxed cuda graph trick definitely won't always work unfortunately... I spoke with someone who works on end-to-end training and he told me that there is a cudaStreamSynchronize() is the torch.amp.GradScaler, which will prevent using relaxed stream capture for models that do gradient scaling in mixed precision training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's around 15 minutes for 150 buckets.
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
…raining Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
.github/workflows/cicd-main.yml
Outdated
/home/TestData/asr_tokenizers/canary/es/tokenizer_spe_bpe_v1024_max_4/tokenizer.model \ | ||
--langs spl_tokens en es \ | ||
--prompt-format canary \ | ||
--prompt '[{"role":"user","slots":{"source_lang":"en","target_lang":"en","task":"asr","pnc":"yes"}}]' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--prompt '[{"role":"user","slots":{"source_lang":"en","target_lang":"en","task":"asr","pnc":"yes"}}]' \ | |
--prompt \'[{"role":"user","slots":{"source_lang":"en","target_lang":"en","task":"asr","pnc":"yes"}}]\' \ |
I think that should do the trick @pzelasko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks... I was seriously scratching my head with this one lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also resulted with an error. I am disabling the check so this PR may go in. If we can figure out how to work around the quoting issue, I will enable the check in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope you don't mind my comments!
Co-authored-by: oliver könig <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to hit approve last time.
Signed-off-by: Piotr Żelasko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will merge this after pipeline pass
…9763) * Initial working draft of the OOMptimizer. Signed-off-by: Piotr Żelasko <[email protected]> * Support model config. Add bucket merging. Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * code review Signed-off-by: Piotr Żelasko <[email protected]> * Support bucket_batch_size option for lhotse dataloading Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Fix for autocast and configurable dtype Signed-off-by: Piotr Żelasko <[email protected]> * Allow token-per-second filtering Signed-off-by: Piotr Żelasko <[email protected]> * Fix an issue with canary tokenizer Signed-off-by: Piotr Żelasko <[email protected]> * Lift the requirement to use CanaryTokenizer with canary prompt format * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Initial 2D bucketing draft Signed-off-by: Piotr Żelasko <[email protected]> * Separate script for 2D bucket estimation Signed-off-by: Piotr Żelasko <[email protected]> * Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * Unit tests for bucket_batch_size and 2D bucketing for audio Signed-off-by: Piotr Żelasko <[email protected]> * Docs for 2D estimate duration bins Signed-off-by: Piotr Żelasko <[email protected]> * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Preliminary support for prompt format in estimate_duration_bins_2d Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix for bucket selection edge case Signed-off-by: Piotr Żelasko <[email protected]> * Add more info about the distribution to estimate_duration_bins_2d.py Signed-off-by: Piotr Żelasko <[email protected]> * Include CUDA RAM usage tracking in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Track batch_size, num frames/tokens, and their padding ratio for AED multi task models Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer documentation Signed-off-by: Piotr Żelasko <[email protected]> * Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities Signed-off-by: Piotr Żelasko <[email protected]> * Add missing property decorator Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * Add docs about 2D bucketing with tokenizer and prompts Signed-off-by: Piotr Żelasko <[email protected]> * Fix bucket allocation logic for 2D bucketing Signed-off-by: Piotr Żelasko <[email protected]> * Bump lhotse version Signed-off-by: Piotr Żelasko <[email protected]> * fix... Signed-off-by: Piotr Żelasko <[email protected]> * Reverse bucket iteration order; move oomptimizer_schema to AsrModel Signed-off-by: Piotr Żelasko <[email protected]> * Make OOMptimizer compatible with dataclass mini-batches Signed-off-by: Piotr Żelasko <[email protected]> * Refine the schema Signed-off-by: Piotr Żelasko <[email protected]> * fixes after merging main Signed-off-by: Piotr Żelasko <[email protected]> * fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc Signed-off-by: Piotr Żelasko <[email protected]> * Disable concurrent bucketing to prevent spawning extra threads in tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests and make life more colorful Signed-off-by: Piotr Żelasko <[email protected]> * formatting Signed-off-by: Piotr Żelasko <[email protected]> * more reasonable starting batch size settings Signed-off-by: Piotr Żelasko <[email protected]> * Disable clearing of cuda memory cache Signed-off-by: Piotr Żelasko <[email protected]> * Even more conservative profile by incorporating DDP overhead simulation Signed-off-by: Piotr Żelasko <[email protected]> * Bucket selection fix and an extended unit test * Refactor registered_prompt_format_fn to enable prompt formatting before Sampler Signed-off-by: Piotr Żelasko <[email protected]> * porting fix Signed-off-by: Piotr Żelasko <[email protected]> * Fixes, move fast-path to prompted dataset Signed-off-by: Piotr Żelasko <[email protected]> * Changes from Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer tests + fixes for 1D bucketing case Signed-off-by: Piotr Żelasko <[email protected]> * estimate duration bins tests Signed-off-by: Piotr Żelasko <[email protected]> * address Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * fix CPU unit test Signed-off-by: Piotr Żelasko <[email protected]> * try to fix CI test Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review Co-authored-by: oliver könig <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> * Disable 2D bucketing test with prompt due to quoting issue Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Co-authored-by: oliver könig <[email protected]>
…9763) * Initial working draft of the OOMptimizer. Signed-off-by: Piotr Żelasko <[email protected]> * Support model config. Add bucket merging. Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * code review Signed-off-by: Piotr Żelasko <[email protected]> * Support bucket_batch_size option for lhotse dataloading Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Fix for autocast and configurable dtype Signed-off-by: Piotr Żelasko <[email protected]> * Allow token-per-second filtering Signed-off-by: Piotr Żelasko <[email protected]> * Fix an issue with canary tokenizer Signed-off-by: Piotr Żelasko <[email protected]> * Lift the requirement to use CanaryTokenizer with canary prompt format * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Initial 2D bucketing draft Signed-off-by: Piotr Żelasko <[email protected]> * Separate script for 2D bucket estimation Signed-off-by: Piotr Żelasko <[email protected]> * Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * Unit tests for bucket_batch_size and 2D bucketing for audio Signed-off-by: Piotr Żelasko <[email protected]> * Docs for 2D estimate duration bins Signed-off-by: Piotr Żelasko <[email protected]> * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Preliminary support for prompt format in estimate_duration_bins_2d Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix for bucket selection edge case Signed-off-by: Piotr Żelasko <[email protected]> * Add more info about the distribution to estimate_duration_bins_2d.py Signed-off-by: Piotr Żelasko <[email protected]> * Include CUDA RAM usage tracking in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Track batch_size, num frames/tokens, and their padding ratio for AED multi task models Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer documentation Signed-off-by: Piotr Żelasko <[email protected]> * Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities Signed-off-by: Piotr Żelasko <[email protected]> * Add missing property decorator Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * Add docs about 2D bucketing with tokenizer and prompts Signed-off-by: Piotr Żelasko <[email protected]> * Fix bucket allocation logic for 2D bucketing Signed-off-by: Piotr Żelasko <[email protected]> * Bump lhotse version Signed-off-by: Piotr Żelasko <[email protected]> * fix... Signed-off-by: Piotr Żelasko <[email protected]> * Reverse bucket iteration order; move oomptimizer_schema to AsrModel Signed-off-by: Piotr Żelasko <[email protected]> * Make OOMptimizer compatible with dataclass mini-batches Signed-off-by: Piotr Żelasko <[email protected]> * Refine the schema Signed-off-by: Piotr Żelasko <[email protected]> * fixes after merging main Signed-off-by: Piotr Żelasko <[email protected]> * fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc Signed-off-by: Piotr Żelasko <[email protected]> * Disable concurrent bucketing to prevent spawning extra threads in tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests and make life more colorful Signed-off-by: Piotr Żelasko <[email protected]> * formatting Signed-off-by: Piotr Żelasko <[email protected]> * more reasonable starting batch size settings Signed-off-by: Piotr Żelasko <[email protected]> * Disable clearing of cuda memory cache Signed-off-by: Piotr Żelasko <[email protected]> * Even more conservative profile by incorporating DDP overhead simulation Signed-off-by: Piotr Żelasko <[email protected]> * Bucket selection fix and an extended unit test * Refactor registered_prompt_format_fn to enable prompt formatting before Sampler Signed-off-by: Piotr Żelasko <[email protected]> * porting fix Signed-off-by: Piotr Żelasko <[email protected]> * Fixes, move fast-path to prompted dataset Signed-off-by: Piotr Żelasko <[email protected]> * Changes from Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer tests + fixes for 1D bucketing case Signed-off-by: Piotr Żelasko <[email protected]> * estimate duration bins tests Signed-off-by: Piotr Żelasko <[email protected]> * address Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * fix CPU unit test Signed-off-by: Piotr Żelasko <[email protected]> * try to fix CI test Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review Co-authored-by: oliver könig <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> * Disable 2D bucketing test with prompt due to quoting issue Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Co-authored-by: oliver könig <[email protected]>
…9763) * Initial working draft of the OOMptimizer. Signed-off-by: Piotr Żelasko <[email protected]> * Support model config. Add bucket merging. Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * code review Signed-off-by: Piotr Żelasko <[email protected]> * Support bucket_batch_size option for lhotse dataloading Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Fix for autocast and configurable dtype Signed-off-by: Piotr Żelasko <[email protected]> * Allow token-per-second filtering Signed-off-by: Piotr Żelasko <[email protected]> * Fix an issue with canary tokenizer Signed-off-by: Piotr Żelasko <[email protected]> * Lift the requirement to use CanaryTokenizer with canary prompt format * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Initial 2D bucketing draft Signed-off-by: Piotr Żelasko <[email protected]> * Separate script for 2D bucket estimation Signed-off-by: Piotr Żelasko <[email protected]> * Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * Unit tests for bucket_batch_size and 2D bucketing for audio Signed-off-by: Piotr Żelasko <[email protected]> * Docs for 2D estimate duration bins Signed-off-by: Piotr Żelasko <[email protected]> * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Preliminary support for prompt format in estimate_duration_bins_2d Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix for bucket selection edge case Signed-off-by: Piotr Żelasko <[email protected]> * Add more info about the distribution to estimate_duration_bins_2d.py Signed-off-by: Piotr Żelasko <[email protected]> * Include CUDA RAM usage tracking in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Track batch_size, num frames/tokens, and their padding ratio for AED multi task models Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer documentation Signed-off-by: Piotr Żelasko <[email protected]> * Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities Signed-off-by: Piotr Żelasko <[email protected]> * Add missing property decorator Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * Add docs about 2D bucketing with tokenizer and prompts Signed-off-by: Piotr Żelasko <[email protected]> * Fix bucket allocation logic for 2D bucketing Signed-off-by: Piotr Żelasko <[email protected]> * Bump lhotse version Signed-off-by: Piotr Żelasko <[email protected]> * fix... Signed-off-by: Piotr Żelasko <[email protected]> * Reverse bucket iteration order; move oomptimizer_schema to AsrModel Signed-off-by: Piotr Żelasko <[email protected]> * Make OOMptimizer compatible with dataclass mini-batches Signed-off-by: Piotr Żelasko <[email protected]> * Refine the schema Signed-off-by: Piotr Żelasko <[email protected]> * fixes after merging main Signed-off-by: Piotr Żelasko <[email protected]> * fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc Signed-off-by: Piotr Żelasko <[email protected]> * Disable concurrent bucketing to prevent spawning extra threads in tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests and make life more colorful Signed-off-by: Piotr Żelasko <[email protected]> * formatting Signed-off-by: Piotr Żelasko <[email protected]> * more reasonable starting batch size settings Signed-off-by: Piotr Żelasko <[email protected]> * Disable clearing of cuda memory cache Signed-off-by: Piotr Żelasko <[email protected]> * Even more conservative profile by incorporating DDP overhead simulation Signed-off-by: Piotr Żelasko <[email protected]> * Bucket selection fix and an extended unit test * Refactor registered_prompt_format_fn to enable prompt formatting before Sampler Signed-off-by: Piotr Żelasko <[email protected]> * porting fix Signed-off-by: Piotr Żelasko <[email protected]> * Fixes, move fast-path to prompted dataset Signed-off-by: Piotr Żelasko <[email protected]> * Changes from Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer tests + fixes for 1D bucketing case Signed-off-by: Piotr Żelasko <[email protected]> * estimate duration bins tests Signed-off-by: Piotr Żelasko <[email protected]> * address Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * fix CPU unit test Signed-off-by: Piotr Żelasko <[email protected]> * try to fix CI test Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review Co-authored-by: oliver könig <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> * Disable 2D bucketing test with prompt due to quoting issue Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Co-authored-by: oliver könig <[email protected]>
…9763) * Initial working draft of the OOMptimizer. Signed-off-by: Piotr Żelasko <[email protected]> * Support model config. Add bucket merging. Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * code review Signed-off-by: Piotr Żelasko <[email protected]> * Support bucket_batch_size option for lhotse dataloading Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Fix for autocast and configurable dtype Signed-off-by: Piotr Żelasko <[email protected]> * Allow token-per-second filtering Signed-off-by: Piotr Żelasko <[email protected]> * Fix an issue with canary tokenizer Signed-off-by: Piotr Żelasko <[email protected]> * Lift the requirement to use CanaryTokenizer with canary prompt format * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Initial 2D bucketing draft Signed-off-by: Piotr Żelasko <[email protected]> * Separate script for 2D bucket estimation Signed-off-by: Piotr Żelasko <[email protected]> * Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * Unit tests for bucket_batch_size and 2D bucketing for audio Signed-off-by: Piotr Żelasko <[email protected]> * Docs for 2D estimate duration bins Signed-off-by: Piotr Żelasko <[email protected]> * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Preliminary support for prompt format in estimate_duration_bins_2d Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix for bucket selection edge case Signed-off-by: Piotr Żelasko <[email protected]> * Add more info about the distribution to estimate_duration_bins_2d.py Signed-off-by: Piotr Żelasko <[email protected]> * Include CUDA RAM usage tracking in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Track batch_size, num frames/tokens, and their padding ratio for AED multi task models Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer documentation Signed-off-by: Piotr Żelasko <[email protected]> * Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities Signed-off-by: Piotr Żelasko <[email protected]> * Add missing property decorator Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * Add docs about 2D bucketing with tokenizer and prompts Signed-off-by: Piotr Żelasko <[email protected]> * Fix bucket allocation logic for 2D bucketing Signed-off-by: Piotr Żelasko <[email protected]> * Bump lhotse version Signed-off-by: Piotr Żelasko <[email protected]> * fix... Signed-off-by: Piotr Żelasko <[email protected]> * Reverse bucket iteration order; move oomptimizer_schema to AsrModel Signed-off-by: Piotr Żelasko <[email protected]> * Make OOMptimizer compatible with dataclass mini-batches Signed-off-by: Piotr Żelasko <[email protected]> * Refine the schema Signed-off-by: Piotr Żelasko <[email protected]> * fixes after merging main Signed-off-by: Piotr Żelasko <[email protected]> * fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc Signed-off-by: Piotr Żelasko <[email protected]> * Disable concurrent bucketing to prevent spawning extra threads in tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests and make life more colorful Signed-off-by: Piotr Żelasko <[email protected]> * formatting Signed-off-by: Piotr Żelasko <[email protected]> * more reasonable starting batch size settings Signed-off-by: Piotr Żelasko <[email protected]> * Disable clearing of cuda memory cache Signed-off-by: Piotr Żelasko <[email protected]> * Even more conservative profile by incorporating DDP overhead simulation Signed-off-by: Piotr Żelasko <[email protected]> * Bucket selection fix and an extended unit test * Refactor registered_prompt_format_fn to enable prompt formatting before Sampler Signed-off-by: Piotr Żelasko <[email protected]> * porting fix Signed-off-by: Piotr Żelasko <[email protected]> * Fixes, move fast-path to prompted dataset Signed-off-by: Piotr Żelasko <[email protected]> * Changes from Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer tests + fixes for 1D bucketing case Signed-off-by: Piotr Żelasko <[email protected]> * estimate duration bins tests Signed-off-by: Piotr Żelasko <[email protected]> * address Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * fix CPU unit test Signed-off-by: Piotr Żelasko <[email protected]> * try to fix CI test Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review Co-authored-by: oliver könig <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> * Disable 2D bucketing test with prompt due to quoting issue Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: adityavavre <[email protected]>
* Initial working draft of the OOMptimizer. Signed-off-by: Piotr Żelasko <[email protected]> * Support model config. Add bucket merging. Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * code review Signed-off-by: Piotr Żelasko <[email protected]> * Support bucket_batch_size option for lhotse dataloading Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Fix for autocast and configurable dtype Signed-off-by: Piotr Żelasko <[email protected]> * Allow token-per-second filtering Signed-off-by: Piotr Żelasko <[email protected]> * Fix an issue with canary tokenizer Signed-off-by: Piotr Żelasko <[email protected]> * Lift the requirement to use CanaryTokenizer with canary prompt format * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Initial 2D bucketing draft Signed-off-by: Piotr Żelasko <[email protected]> * Separate script for 2D bucket estimation Signed-off-by: Piotr Żelasko <[email protected]> * Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * Unit tests for bucket_batch_size and 2D bucketing for audio Signed-off-by: Piotr Żelasko <[email protected]> * Docs for 2D estimate duration bins Signed-off-by: Piotr Żelasko <[email protected]> * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Preliminary support for prompt format in estimate_duration_bins_2d Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix for bucket selection edge case Signed-off-by: Piotr Żelasko <[email protected]> * Add more info about the distribution to estimate_duration_bins_2d.py Signed-off-by: Piotr Żelasko <[email protected]> * Include CUDA RAM usage tracking in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Track batch_size, num frames/tokens, and their padding ratio for AED multi task models Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer documentation Signed-off-by: Piotr Żelasko <[email protected]> * Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities Signed-off-by: Piotr Żelasko <[email protected]> * Add missing property decorator Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * Add docs about 2D bucketing with tokenizer and prompts Signed-off-by: Piotr Żelasko <[email protected]> * Fix bucket allocation logic for 2D bucketing Signed-off-by: Piotr Żelasko <[email protected]> * Bump lhotse version Signed-off-by: Piotr Żelasko <[email protected]> * fix... Signed-off-by: Piotr Żelasko <[email protected]> * Reverse bucket iteration order; move oomptimizer_schema to AsrModel Signed-off-by: Piotr Żelasko <[email protected]> * Make OOMptimizer compatible with dataclass mini-batches Signed-off-by: Piotr Żelasko <[email protected]> * Refine the schema Signed-off-by: Piotr Żelasko <[email protected]> * fixes after merging main Signed-off-by: Piotr Żelasko <[email protected]> * fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc Signed-off-by: Piotr Żelasko <[email protected]> * Disable concurrent bucketing to prevent spawning extra threads in tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests and make life more colorful Signed-off-by: Piotr Żelasko <[email protected]> * formatting Signed-off-by: Piotr Żelasko <[email protected]> * more reasonable starting batch size settings Signed-off-by: Piotr Żelasko <[email protected]> * Disable clearing of cuda memory cache Signed-off-by: Piotr Żelasko <[email protected]> * Even more conservative profile by incorporating DDP overhead simulation Signed-off-by: Piotr Żelasko <[email protected]> * Bucket selection fix and an extended unit test * Refactor registered_prompt_format_fn to enable prompt formatting before Sampler Signed-off-by: Piotr Żelasko <[email protected]> * porting fix Signed-off-by: Piotr Żelasko <[email protected]> * Fixes, move fast-path to prompted dataset Signed-off-by: Piotr Żelasko <[email protected]> * Changes from Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer tests + fixes for 1D bucketing case Signed-off-by: Piotr Żelasko <[email protected]> * estimate duration bins tests Signed-off-by: Piotr Żelasko <[email protected]> * address Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * fix CPU unit test Signed-off-by: Piotr Żelasko <[email protected]> * try to fix CI test Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review Co-authored-by: oliver könig <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> * Disable 2D bucketing test with prompt due to quoting issue Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Co-authored-by: oliver könig <[email protected]>
…9763) * Initial working draft of the OOMptimizer. Signed-off-by: Piotr Żelasko <[email protected]> * Support model config. Add bucket merging. Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * code review Signed-off-by: Piotr Żelasko <[email protected]> * Support bucket_batch_size option for lhotse dataloading Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Ability to force a memory fraction to be unused in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Fix for autocast and configurable dtype Signed-off-by: Piotr Żelasko <[email protected]> * Allow token-per-second filtering Signed-off-by: Piotr Żelasko <[email protected]> * Fix an issue with canary tokenizer Signed-off-by: Piotr Żelasko <[email protected]> * Lift the requirement to use CanaryTokenizer with canary prompt format * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Initial 2D bucketing draft Signed-off-by: Piotr Żelasko <[email protected]> * Separate script for 2D bucket estimation Signed-off-by: Piotr Żelasko <[email protected]> * Full 2D bucketing support: estimate_uduration_bins_2d, oomptimizer, training Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * Unit tests for bucket_batch_size and 2D bucketing for audio Signed-off-by: Piotr Żelasko <[email protected]> * Docs for 2D estimate duration bins Signed-off-by: Piotr Żelasko <[email protected]> * Fixes Signed-off-by: Piotr Żelasko <[email protected]> * Preliminary support for prompt format in estimate_duration_bins_2d Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix for bucket selection edge case Signed-off-by: Piotr Żelasko <[email protected]> * Add more info about the distribution to estimate_duration_bins_2d.py Signed-off-by: Piotr Żelasko <[email protected]> * Include CUDA RAM usage tracking in OOMptimizer Signed-off-by: Piotr Żelasko <[email protected]> * Track batch_size, num frames/tokens, and their padding ratio for AED multi task models Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer documentation Signed-off-by: Piotr Żelasko <[email protected]> * Resolve TODOs and support any combination of (audio|text)->(audio|text) modalities Signed-off-by: Piotr Żelasko <[email protected]> * Add missing property decorator Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * Add docs about 2D bucketing with tokenizer and prompts Signed-off-by: Piotr Żelasko <[email protected]> * Fix bucket allocation logic for 2D bucketing Signed-off-by: Piotr Żelasko <[email protected]> * Bump lhotse version Signed-off-by: Piotr Żelasko <[email protected]> * fix... Signed-off-by: Piotr Żelasko <[email protected]> * Reverse bucket iteration order; move oomptimizer_schema to AsrModel Signed-off-by: Piotr Żelasko <[email protected]> * Make OOMptimizer compatible with dataclass mini-batches Signed-off-by: Piotr Żelasko <[email protected]> * Refine the schema Signed-off-by: Piotr Żelasko <[email protected]> * fixes after merging main Signed-off-by: Piotr Żelasko <[email protected]> * fix oomptimizer with pretrained models; verified canary, parakeet tdt and ctc Signed-off-by: Piotr Żelasko <[email protected]> * Disable concurrent bucketing to prevent spawning extra threads in tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests and make life more colorful Signed-off-by: Piotr Żelasko <[email protected]> * formatting Signed-off-by: Piotr Żelasko <[email protected]> * more reasonable starting batch size settings Signed-off-by: Piotr Żelasko <[email protected]> * Disable clearing of cuda memory cache Signed-off-by: Piotr Żelasko <[email protected]> * Even more conservative profile by incorporating DDP overhead simulation Signed-off-by: Piotr Żelasko <[email protected]> * Bucket selection fix and an extended unit test * Refactor registered_prompt_format_fn to enable prompt formatting before Sampler Signed-off-by: Piotr Żelasko <[email protected]> * porting fix Signed-off-by: Piotr Żelasko <[email protected]> * Fixes, move fast-path to prompted dataset Signed-off-by: Piotr Żelasko <[email protected]> * Changes from Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * OOMptimizer tests + fixes for 1D bucketing case Signed-off-by: Piotr Żelasko <[email protected]> * estimate duration bins tests Signed-off-by: Piotr Żelasko <[email protected]> * address Daniel's review Signed-off-by: Piotr Żelasko <[email protected]> * fix CPU unit test Signed-off-by: Piotr Żelasko <[email protected]> * try to fix CI test Signed-off-by: Piotr Żelasko <[email protected]> * Apply suggestions from code review Co-authored-by: oliver könig <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> * Disable 2D bucketing test with prompt due to quoting issue Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
What does this PR do ?
Major contributions:
Collection: ASR
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information