Multi-word static vectors #7790
Replies: 1 comment 2 replies
-
I haven't heard of anyone doing this before, but maybe you could just merge the tokens before the tok2vec step? If you use your vectors instead of (as opposed to in addition to) the standard spaCy vectors and train from scratch that should work. It would be like using a custom tokenizer. Based on your description it's not clear if your vectors have good coverage or if they're only for select entities. If they're only for select entities, you could use them as a seed and pretrain to get better coverage while retaining some of whatever you've learned. Getting a balance might be difficult though. Using your vectors with the existing tok2vec, as a separate feature source, is a multimodal task and there isn't really a straightforward way to do that at the moment. If you're feeling adventurous you can look at how HashEmbed is implemented (particularly the way shape features are used) and write something similar, but I think it might be a lot of work. |
Beta Was this translation helpful? Give feedback.
-
I'm designing a text classification model that I'd like to be able to pass some additional features to (beyond those learned by the tok2vec component). These features are typically not token-specific but apply to Spans of tokens. Based on my reading/playing with the static vector functionality in model training, it seems like static vectors need to be keyed on individual words or tokens. I was looking into sense2vec, but it seems like the existing model architectures aren't designed to access sense2vec features. So I guess I have a couple questions:
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions