-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to turn off generator (make it a pointer only network) #5
Comments
I'm going to experiment with what looks like some simple hacks (set the chance of using the generator network to 0 if a certain param is true) and I will submit a PR if it works - I will likely fork this project then. |
So, I can make it only use the vocabulary available to it in the source text (by turning the probability of pointing to be 1 and generation to be 0), but the ideal solution will not rearrange my source text. I am trying to train something that will "highlight" the most important part of a document without rearranging the document. The ideal solution chooses either to include the source word or not include it. |
Yes, you can try to disable the generator in various ways, such as the way you described (setting the pointer probability to 1) or setting the vocabulary size to a very small number (I don't know if setting it to 0 will break anything). You're welcome to submit a PR if anything needs to be fixed in order to disable the generator. However, the task you are working on doesn't seem to require a full seq2seq model. You only need to label each input token as important or not important (i.e. binary classification). This can be achieved by running a bi-GRU or bi-LSTM (multiple layers if needed) over the input (similar to my encoder), and then applying a sigmoid function on the output state of each token to get a score between 0 and 1. |
@ymfa do you have any recommendations for frameworks or other projects which would be useful for myself? I'd like to find a way to get pre-trained word embeddings into whatever tool I use to do the token classification |
Also, can you critique another idea I have? I have a dataset that includes basically large amounts of news articles, an extractive "highlighted" version of the article, and an abstractively made human summary. I think that if I go from long news article to short abstract, I can "tailor" my summary in such a way that my abstract saying "Strawberries taste bad" and "Strawberries are yummy!" can highlight the document differently. Any ideas on trying to do this highlighting via attention? I've experimented with this idea by trying to sum the attention layers and heads in BERT together and use the top words most attended to (but, I think a naive sum of all attention layers is wrong so it didn't work vert well). I'll try to experiment with it using the visualization tool you've included. If it's possible, can you help me work out the way to get a list of words and their corresponding attention score from a trained example? I'd be forever in your debt! |
I'm really looking for an effective word-level summarization solution. It isn't clear to me how to turn off the "generator" part of the pointer-generator network.
Let me know if this is possible and how I can achieve this.
The text was updated successfully, but these errors were encountered: