Data Augmentation for NER #7500
Replies: 2 comments 5 replies
-
So there are several questions here, but to address a few of them:
NER does not use
Since the token shape is already capturing this variation I wouldn't expect this to have a large effect.
You're trying to use the I think that given those points that addresses the overall thrust of your question, but if there's something I missed let me know. |
Beta Was this translation helpful? Give feedback.
-
@architectures = "spacy.MultiHashEmbed.v1"
width = 96
attrs = ["NORM","PREFIX","SUFFIX","SHAPE","IS_DIGIT"]
rows = [5000,2500,2500,2500,100]
include_static_vectors = false Because of how Right now it's hard to use a feature like |
Beta Was this translation helpful? Give feedback.
-
I am training a NER for digit-valued entities. I would like to understand what is the current mechanics and best practices for such case:
is_digit
orpos
properties as features, and if not, where should one look to enable it?Now I need to add existing entities.
If I modify
Doc
construction according to documentation:I'm getting:
ValueError: [E177] Ill-formed IOB input detected: B
.I might not completely get what the documentation
help(Doc)
is saying:what kind of Unicode strings? I wasn't able to find any example on the web.
As a workaround, I tried this custom cython function:
but when I try to run it on my entity, it breaks:
The actual values are:
It looks similar to this issue, but that one is locked and there seems no solution.
Beta Was this translation helpful? Give feedback.
All reactions