Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix return_dict in encodec #31646

Merged
merged 5 commits into from
Jun 28, 2024
Merged

Conversation

jla524
Copy link
Contributor

@jla524 jla524 commented Jun 26, 2024

What does this PR do?

Fixes #31642 (issue)

With this PR, return_dict=False returns a tuple, and the unit test compares tuple vs dict values.

% pytest tests/models/encodec/test_modeling_encodec.py -k test_model_outputs_equivalence
=========================================================== test session starts ============================================================
platform darwin -- Python 3.12.3, pytest-7.4.4, pluggy-1.4.0
rootdir: /Users/jacky/repos/transformers
configfile: pyproject.toml
plugins: xdist-3.5.0, timeout-2.3.1, rich-0.1.1
collected 116 items / 115 deselected / 1 selected                                                                                          

tests/models/encodec/test_modeling_encodec.py .                                                                                      [100%]

<warnings redacted>
============================================== 1 passed, 115 deselected, 9 warnings in 1.56s ===============================================

Who can review?

@kamilakesbi

Copy link
Contributor

@kamilakesbi kamilakesbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jla524 for iterating on this!

I left a comment for small suggested changes :)

Comment on lines 789 to 790
if return_dict is None:
return_dict = self.config.return_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could do instead:

Suggested change
if return_dict is None:
return_dict = self.config.return_dict
return_dict = return_dict if return_dict is not None else self.config.return_dict

so that return_dict is still set to self.config.return_dict by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be also nice to do the same modification in the decode method (at this line).

@kamilakesbi
Copy link
Contributor

Thanks for iterating on this @jla524!

@amyeroberts this should be ready for final review/merge :)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Just a q on the change to the recursive check

if isinstance(tuple_object, (List, Tuple)):
for tuple_iterable_value, dict_iterable_value in zip(tuple_object, dict_object):
recursive_check(tuple_iterable_value, dict_iterable_value)
elif isinstance(tuple_object, Dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we assert that tuple_object is a tuple on L377, as this is a recursive function, isn't is still possible that the values in tuple_object i.e. tuple_iterable_value are a dict or None?

Copy link
Contributor Author

@jla524 jla524 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values in the tuple_object should be tensor.Tensor only.

To check the types, I added a print statement in the recursive function:

def recursive_check(tuple_object, dict_object):
    print(f"[DEBUG]: {type(tuple_object)=}, {type(dict_object)=}")
    ...

and I got this:

[DEBUG]: type(tuple_object)=<class 'tuple'>, type(dict_object)=<class 'transformers.models.encodec.modeling_encodec.EncodecOutput'>
[DEBUG]: type(tuple_object)=<class 'torch.Tensor'>, type(dict_object)=<class 'torch.Tensor'>
[DEBUG]: type(tuple_object)=<class 'torch.Tensor'>, type(dict_object)=<class 'torch.Tensor'>

edit: it's probably more intuitive to just iterate over the items and compare it

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing ad iterating on this!

@amyeroberts amyeroberts merged commit 82a1fc7 into huggingface:main Jun 28, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

return_dict in encodec is always set to True:
4 participants