-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature encoding enhancements #47
Comments
That sounds great. I do think we should keep the option to do it the current way for 1) reproducibility of papers that do it that way (though I suspect it was often done this way out of convenience for using popular MT libraries in the past :D), and 2) perhaps to make comparisons wrt runtime? Another note: then in the |
One thought I've had is maybe we should have a vintage, waxed mustache, penny-farthing-style fork that contains all the old methods, for that kind of work.
Yes, that'd be a big improvement to readability etc. Good point. |
Howdy y'all, I've emerged from dissertation land and can return to this. Looking over the modules, here are my thoughts (pr to follow after some coffee runs):
Will share later this evening/tomorrow for criticism and stone throwing. |
Hi Travis. This sounds good so far---might just call it One thing I am stuck on, design-wise, is whether one picks three pieces:
or whether the latter two must be combined into one.
Setting up these factories is going to be somewhat difficult, because the later components need to see the encoder and/or its sizes to instantiate themselves, so maybe you will need a sort of meta-factory to combine them. Your (3) seems fine so far. I think we want to see a design doc of sorts before you go ahead with implementation. |
This sounds good, though I agree a more concrete design doc could be good. Esp. since this ticket is not really time critical AFAIK. I would actually like to build on this as well later so that an encoder or decoder can be pretrained on some dataset and the weights can be loaded and 'finetuned' I think this would probably be a step towards making it obvious where to put that code, at the very least. |
So this is what I currently have design wise:
I can throw stuff into a design document, but don't have a preferred template. Regarding priority: management of feature behavior is related to this breaking issue #61 |
I don't see how this design would work with pointer-generators, so that needs to be clarified. (You may want to chat with @Adamits about this later---there's a subtlety we discussed recently.) I don't really care about the template for design doc (pure Google Docs, with typewriter text for code, is what I usually use) but I think this is complicated enough that your design should provide proposed interfaces (e.g., typed Python function signatures) and also propose changes to the command-line interface. |
Aighty oh, wrote down some thoughts on how to proceed. |
Thanks for the doc. A couple thoughts:
Before you go ahead I would like to see the exact design for the new CLI flags to accomodate the proposed change. |
That works too. I think that was used before we separated models into
Not sure if the most pythonic, but I've seen in some not dumpster fire code.
Oh I meant automatically. The
Sure, was just thinking function to keep consistent with all other factories.
As in just write them down as a sample script run or would want to see how they'd be pass in the code? |
I am just saying I don't like that stylistically. It certainly works but I think it's extremely hard to read and reason about.
Okay, if the "flag" here is outside of user control I have no objection, thanks for clarification.
I was thinking it could be like Adam's embedding initialization stuff.
I think I would want to see flag names, doc strings, etc.---I trust how they'd get passed around is something you'd have no trouble with. |
Fair enough, it's easier to code anyhows.
Will co-opt that then.
Okie doke, let me draft that up in a separate comment. |
Sample CLI: sample_run.txt Docstrings for Base: base.txt Docstrings for Encoder: encoder.txt Lemme know where further detail would help. |
Okay, thanks for that Travis. that's what I needed. Everything looks good, but two suggestions:
I am overall very much in support of the proposal as I understand it. |
Variable update suggestions (and paper idea) accepted. I"ll start working on the PR sometime next week. |
Only thing to fix would be to extend |
Okay, definitely not urgent. I would do this as a way of testing out whether feature-invariance is really a contribution... |
[This continues the discussion in #12.]
Both the transducer and the pointer-generator treat features in architecture-specific ways; issue #12 deals with their ideal treatment in the transducer, since the Makarov/Clematide mechanism of treating them as one-hot embeddings appears to be ineffectual. In contrast, they are just concatenated by the LSTM and transformer models. My proposal is that we should, in attentive LSTM and transformer, have separate (LSTM and transformer, respectively) encoders for features and these encoders should then be lengthwise concatenated. To be more explicit, imagine the source tensor is of size batch_size x hidden_size x source_length and the feature tensor is of size batch_size x hidden_size x feature_length. Then I propose that we concatenate these to form a tensor of size batch_size x hidden_size x (source_length + feature_length), and attention operates over that larger tensor.
As a result, we can use the features column to do multi-source translation in general. Furthermore, the dataset getter is no longer is conditioned on architecture: it just has features and no-feature variants, which makes things a bit simpler.
(Note that this will not work for the inattentive LSTM; something else will have to be done or we can just dump it.)
A distant enhancement is that it would be possible, in theory, to have different encoders for source and feature (LSTM vs. GRU vs. transformer); an even more distant enhancement would be to allow these to have different dimensionalities and use linear projection to map the feature encoding back onto the source encoding. I do not actually think we should do either of these, but it's a thing we could do...
The text was updated successfully, but these errors were encountered: