Moves to mapper interface #266

kylebgorman · 2024-11-25T17:50:52Z

Borrowing a design element I used in UDTube, I decompose the dataset object into two pieces:

a Mapper interface which knows how to map between lists of strings and tensors (to decode and encode)
DataSet, as before.

There was no particular reason for the mapper functions to live inside the dataset. This separates the two pieces and uses the mapper object for prediction.

Closes #137. That issue says that the encoding/decoding should be moved to the index, but this actually makes those two even more modular.

(I also imported fix #272 and resolved some merge stuff from #268 etc.)

Borrowing a design element I used in UDTube, I decompose the dataset object into two pieces: * a `Mapper` interface which knows how to map between lists of strings and tensors (to decode and encode) * `DataSet`, as before There was no particular reason for the mapper functions to live inside the dataset, and this commit simply makes this separation. A subsequent commit will use this mapper object during prediction.

You can just simulate this by appending an additional string onto the name of the model_dir if needed.

Adamits

I don't really have a problem with it, but I don't actually follow the reasoning for needing another class. The "mapping" operations, to me, could intuitively live in the index, the dataset, or the datamodule. Is the ambiguity there around where it should live actually the problem? Or is this just an OOP principle that someone established?

yoyodyne/predict.py

kylebgorman · 2024-12-02T15:56:07Z

I don't really have a problem with it, but I don't actually follow the reasoning for needing another class. The "mapping" operations, to me, could intuitively live in the index, the dataset, or the datamodule. Is the ambiguity there around where it should live actually the problem? Or is this just an OOP principle that someone established?

The way to think of the design is this:

The datamodule creates an index.
The datamodule creates datasets.
At the creation of each dataset, a mapper is made from the index and passed to the dataset. (This is necessary because the dataset generates tensors on the fly.)

I am moving it out of the dataset in this PR, because there are a number of places where you want to do what the mapping does but you don't need a reference to the full dataset. (For instance, you don't need the huge list that contains the actual data.) One example of this is in prediction: you need to map but there's no direct reference to the dataset. Putting it in a separate class helps with potential circularity issues.

The other obvious option is to make it part of the index but there are places where we need an index but not a mapper or vice versa. For instance the expert class doesn't need tensors or any of the padding, so it uses the index but not the mapper. I despaired of a way to separate out "string to integer" vs. "integers to tensors" except putting them in separate classes.

The general OOP principle at play here is the Law of Demeter. In the old code (this was part of predict.py) had loader.dataset.decode_target(...); this now reads mapper.decode_target(...).

Adamits · 2024-12-02T17:15:38Z

Thanks for describing your thinking.

there are a number of places where you want to do what the mapping does but you don't need a reference to the full dataset

Yes good point. I think another place this is helpful is for runtime debugging. previously if I wanted to log encoded inputs/predicted outputs in the forward function at runtime, I needed the dataset.

Adamits

LGTM

kylebgorman added 7 commits November 25, 2024 12:46

Many bugfixes.

83b98f9

Removes experiment layer

18ced9c

You can just simulate this by appending an additional string onto the name of the model_dir if needed.

Updates README to reflect last commit

37cc30e

Updates and adds prediction support.

b38e968

Merge branch 'master' into mapper

5e26f7b

remove unused

3dcfe12

kylebgorman mentioned this pull request Nov 27, 2024

AttributeError: module 'yoyodyne.models' has no attribute 'BaseEncoderDecoder' #270

Closed

kylebgorman added 3 commits November 30, 2024 13:00

Cleanup.

1aedf86

Merge branch 'master' into mapper

56af4b0

Indentation

40d76b5

kylebgorman marked this pull request as ready for review November 30, 2024 18:23

kylebgorman requested a review from Adamits November 30, 2024 18:23

kylebgorman added the enhancement New feature or request label Dec 1, 2024

Adamits reviewed Dec 2, 2024

View reviewed changes

yoyodyne/predict.py Show resolved Hide resolved

yoyodyne/predict.py Show resolved Hide resolved

adds comment

56a106c

kylebgorman added 9 commits December 2, 2024 10:56

Last-minute

a1272af

Merge branch 'master' into mapper

40af8e9

black/flake8 updates

b879b32

Updates version number further

ddb6989

Merge branch 'master' into mapper

f2db2f9

Remove redundant instance variable.

f00d3f9

Docs cleanup

7b7b32a

wrap

3543096

typos

8bc0127

Adamits approved these changes Dec 2, 2024

View reviewed changes

kylebgorman merged commit f3550a1 into master Dec 2, 2024
8 checks passed

kylebgorman deleted the mapper branch December 9, 2024 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moves to mapper interface #266

Moves to mapper interface #266

kylebgorman commented Nov 25, 2024 •

edited

Loading

Adamits left a comment

kylebgorman commented Dec 2, 2024 •

edited

Loading

Adamits commented Dec 2, 2024

Adamits left a comment

Moves to mapper interface #266

Moves to mapper interface #266

Conversation

kylebgorman commented Nov 25, 2024 • edited Loading

Adamits left a comment

Choose a reason for hiding this comment

kylebgorman commented Dec 2, 2024 • edited Loading

Adamits commented Dec 2, 2024

Adamits left a comment

Choose a reason for hiding this comment

kylebgorman commented Nov 25, 2024 •

edited

Loading

kylebgorman commented Dec 2, 2024 •

edited

Loading