add variable batch_size for training (#3387) #3388
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
context
batch_size
(like 32) to warm up then use a largerbatch_size
(like 64) for the rest of training.batch_size
to be constant.NOTE: this "variable batch" concept is fundamentally different from the "variable length" (VLE/VBE)
batch_size
(which can only vary in a later iteration), so it follows the correlation:batch_size = length(kjt._lengths) // len(kjt._keys)
, andkjt.stride()
returns thebatch_size
by calculation from_lengths
and_keys
.batch_size
, and there's no correlation between_lengths
and_keys
orbatch_size
.batch_size
as a dynamic shape implicitly from themark_dynamic_kjt
util function.WARNING: it's the user's responsibility to make sure that the
variable_batch
is only used when settingvariable_length
toFalse
, otherwise it will cause unexpected behavior with the dynamic shapes in torch.exportReviewed By: spmex, malaybag
Differential Revision: D82792378