-
Notifications
You must be signed in to change notification settings - Fork 19
Moves to mapper interface #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Borrowing a design element I used in UDTube, I decompose the dataset object into two pieces: * a `Mapper` interface which knows how to map between lists of strings and tensors (to decode and encode) * `DataSet`, as before There was no particular reason for the mapper functions to live inside the dataset, and this commit simply makes this separation. A subsequent commit will use this mapper object during prediction.
You can just simulate this by appending an additional string onto the name of the model_dir if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have a problem with it, but I don't actually follow the reasoning for needing another class. The "mapping" operations, to me, could intuitively live in the index, the dataset, or the datamodule. Is the ambiguity there around where it should live actually the problem? Or is this just an OOP principle that someone established?
The way to think of the design is this:
I am moving it out of the dataset in this PR, because there are a number of places where you want to do what the mapping does but you don't need a reference to the full dataset. (For instance, you don't need the huge list that contains the actual data.) One example of this is in prediction: you need to map but there's no direct reference to the dataset. Putting it in a separate class helps with potential circularity issues. The other obvious option is to make it part of the index but there are places where we need an index but not a mapper or vice versa. For instance the expert class doesn't need tensors or any of the padding, so it uses the index but not the mapper. I despaired of a way to separate out "string to integer" vs. "integers to tensors" except putting them in separate classes. The general OOP principle at play here is the Law of Demeter. In the old code (this was part of |
Thanks for describing your thinking.
Yes good point. I think another place this is helpful is for runtime debugging. previously if I wanted to log encoded inputs/predicted outputs in the forward function at runtime, I needed the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Borrowing a design element I used in UDTube, I decompose the dataset object into two pieces:
Mapper
interface which knows how to map between lists of strings and tensors (to decode and encode)DataSet
, as before.There was no particular reason for the mapper functions to live inside the dataset. This separates the two pieces and uses the mapper object for prediction.
Closes #137. That issue says that the encoding/decoding should be moved to the index, but this actually makes those two even more modular.
(I also imported fix #272 and resolved some merge stuff from #268 etc.)