Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PT: add uniform likelihood bucket batching #1661

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

NeoLegends
Copy link
Collaborator

@NeoLegends NeoLegends commented Dec 5, 2024

This is a first attempt at automatically optimizing the bucket limits during training. After every subepoch the limits are adjusted to make every bucket catch a roughly equal number of segments.

I am debating on two more things:

  1. Keeping only a fixed number of seq lens that have passed through the batching (e.g. 10k + upper + lower value) to have an upper bound on the memory consumption. Whenever the list is extended by another batch and crosses said limit, we re-sample randomly from that list to keep it at a fixed size. As long as the number of seq lens we keep is large enough this shouldn't have a big influence on the accuracy of the results.
  2. Keeping seq len statistics even across subepochs to make the statistics better.

@NeoLegends NeoLegends self-assigned this Dec 5, 2024
@NeoLegends NeoLegends force-pushed the moritz-optim-bucket-limits branch 4 times, most recently from 9b53a49 to c80d589 Compare December 5, 2024 13:10
@NeoLegends NeoLegends marked this pull request as ready for review December 5, 2024 13:13
@NeoLegends NeoLegends requested review from albertz and a team as code owners December 5, 2024 13:13
@NeoLegends NeoLegends force-pushed the moritz-optim-bucket-limits branch 2 times, most recently from a2d513a to 1392e99 Compare December 5, 2024 13:17
@NeoLegends NeoLegends changed the title PT: add optimizing bucket batching PT: add uniform likelihood bucket batching Dec 5, 2024
@NeoLegends NeoLegends force-pushed the moritz-optim-bucket-limits branch from 1392e99 to 0a561e4 Compare December 5, 2024 13:17
@albertz
Copy link
Member

albertz commented Dec 5, 2024

When I mention that I would try to optimize this, I was more thinking about writing a dedicated script just to do that.

But this here could also be interesting.

I'm not sure thought that I would put this into RETURNN yet, while you are still experimenting with it. You can rather put this somewhere into your i6_experiments.users. and then just use it from there. (I think we might need to extend the class resolution slightly, if "." in cls_name: ...; see similar code in rf.build_from_dict or get_optimizer_class or other such functions.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants