Restrict spaCy from creating NER parts-of-speech Spans in Doc instance. #7514
Replies: 1 comment
-
Hello, sorry for not commenting on the issue first, but this looked like a good fit for the Discussions board, so I moved it here. First, as a note, entities (PERSON, DATE, ORG) are not the same as part of speech tags (VERB, NOUN, ADJ). They are set by separate components.
It sounds like the default NER component is still enabled in your pipeline and is adding the usual entity labels. You can disable the NER component if you don't need it.
It's hard to say in the general case. The default NER models contain entities that are relevant to a wide array of domains, so often it's helpful to keep them, but it really depends. In general the best thing when you don't know what approach is better is to try all the alternatives and measure performance.
I'm a little unclear on what you mean here - are you extracting a numeric value from text ("The meter's reading was 29.5") or are you using text to estimate a numeric value that is not literally present in the text ("It was so hot today my ice cream evaporated" -> 32C)? If it's the first one I'm inclined to guess that the default NER may not be helping you much, but it might help prevent your model from flagging dates as meter readings, for example. Again, it depends very much on the details of your data. |
Beta Was this translation helpful? Give feedback.
-
We train a model for NER from Doc objects collected into a DocBin. When each Doc is created, we add annotations as labeled Spans through Doc.set_ents(entities=. So far so good. What we see is that spacy is automatically adding other Spans for parts-of-speech labels that we do not create, such as ORG, PERSON, DATE, etc.
Two questions:
Thank you!
Your Environment
spaCy version 3.0.1
Location D:\ProgramData\Precision-Oncology-Apps_venv_bobz\lib\site-packages\spacy
Platform Windows-2012ServerR2-6.3.9600-SP0
Python version 3.7.6
Pipelines en_core_web_lg (3.0.0), en_core_web_md (3.0.0), en_core_web_sm (3.0.0)
Beta Was this translation helpful? Give feedback.
All reactions