-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please reopen issue #30361 #31635
Comments
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
See my previous answer -- this expectation is incorrect: #30361 (comment) As for the other issues, they are explained by the arguments not being passed correctly inside the trainer. Have you confirmed your issues against a recent version? (>= 4.43) |
Hi @gante , thanks for the reply.
I've read the previous answers and I think that some things are still wrongfully stated. E.g., Setting the It only works if one sets both
I haven't had the chance to do that cause the code base relies on an earlier version of HF. |
Regarding previous versions: I'm afraid I won't be able to fix bad behavior in previous versions. But if the bad behaviour is still present, I'd be glad to explore 🤗 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers: 4.39.3
python: 3.12
system: linux
Who can help?
@muellerz @SunMarc @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Steps outline here
Did some further investigation, created custom trainer from subclassing
Seq2SeqTrainer
and directly calledmodel.generate
What we observe is that the number of generated tokens fluctuate on every batch:
Expected behavior
To work as expected and generated tokens to adhere to either
max_new_tokens
ormax_length
.Edit
This only works if you call
model.generate(inputs, generation_config=gen_conf_obj)
withGenerationConfig
object but seems to not work if you callmodel.generate(inputs, max_new_tokens=128)
, it defaults to generating tokens of context_size=20, which contradicts the examples shown in text-generation.Stated as:
Finally, this seems to be working with
Seq2SeqTrainingArguments
when passing ageneration_config
object but how exactly do we get the default values frommodel.generation_config
. For instance, we have X,Y,Z models each with differentmodel.generation_config
I want to get those defaults and update the ones I'm interested?Doing
model.generate_config.to_dict()
brings everything and it's not working.Also, getting those params that are only set to some value still not working as intended.
The
model.generation_config
should have an easy to access method that returns the default values per model?Is there a way to access those default values, e..g,
model.generation_config.get_defaults().to_dict()
so that I can instantiate aGenerationConfig
from those and add or update the ones I need based on the specific task. Did it by simply parsing the output but should have a proper way to access those?Edit 2
Passing a
GenerationConfig
object in theSeq2SeqTrainingArguments
is completely ignored for some models (e.g., bart-base, flan-t5-small, etc), interesting is the fact that passinggen_conf = GenerationConfig(max_new_tokens=128)
directly tomodel.generate(inputs, generation_config=gen_conf)
(for CustomSeq2SeqTrainer) is also being ignored, same formodel.generate(max_length=128)
as well asmodel.generate(max_new_tokens=128)
, in the case ofbart-base
.Please provide a work around in this case. We need a modular and robust solution across different models.
The text was updated successfully, but these errors were encountered: