-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent batch size #50
Comments
We could consider following huggingface here? Though in our case we would not default to no padding, and I think |
What specifically in that doc do you want to follow? My thoughts:
|
Sorry, I thought I had link to a specific table:
Basically, just that we can have a padding argument that accepts strings like: |
Yeah, I think we just want a flag that opts us into Truncation makes more sense in BERT-ish land for other reasons, seems to me. |
Sounds good! Yes when you are training an LM on all the data you can possibly find, truncating is not such a problem :). |
Now that #40 is closed and such, I am wondering: what is the best way to enable this?
This information seems to need to live in the collator as a member. Then when it is asked to make a tensor, we'll pass Any thoughts on this design @Adamits? |
This sounds good. We probably should add documentation for padding (default pads to batch max, this arg sets it to dataset max)
Dataset makes the most sense to me. I think the
This is a good question. I think we should let it raise. However, it is worth considering raising a different message if pad_max is used? Probably not necessary since the obvious solution is just to increase
Agreed. So in this way, we can get the pad max on the dataset, pass it to the collator, and if that attribute is not None, we always pad to it when running the collator. I think this sounds good!
|
Okay, so:
|
It is not plugged into anything yet. Working on issue CUNY-CL#50.
I now have an early draft of this here, in the
I set the actual source_max_length to the min(max(longest source string in train, longest source string in dev), --max_source_length)), and similarly for target length. I then lightly modify the LSTM (it needs to be told the max source length) and the transformer (it needs to make the positional embedding as large as max of the longest source and target string). Everything else is just plumbing. If you have some time I'd appreciate your input @Adamits...I could generate an early PR for review if that'd help. |
I have a need (discussed off-thread) to make it so that every batch is the same size. This seems to me to require the following:
@property
is appropriate here)--pad_to_max
or something) which when enabled causes source, features, and target (respectively) to be padded to the appropriate length from (1) respectively; it just needs to be passed aspad_len
to thebatches.PaddedTensor
constructor to achieve thisI will put this aside until #40 is completed, however, since this may interact in weird ways with that.
The text was updated successfully, but these errors were encountered: