SpanCategorizer confidence scores are equal for all spans. #12090

ruben-dedoncker · 2023-01-11T09:58:26Z

ruben-dedoncker
Jan 11, 2023

I'm trying to understand how to interpret the confidence scores returned by SpanCategorizer.predict(). For some reason it always returns the same tuple of confidence scores regardless of the document I provide it to predict.

Below is a code snippet to illustrate my question:

import numpy as np
import spacy

docs = get_docs() # Some list of docs
spancat = spacy.load('path/to_model').get_pipe('spancat')
_, scores1 = spancat.predict(docs[0:1]) # Pick a first document
_, scores2 = spancat.predict(docs[10:11]) # Pick some random other document
unique1 = np.unique(scores1)
unique2 = np.unique(scores2)

unique1 == unique2  # True
len(unique1)  # 1

The final two expressions are always the same, regardless of which documents I choose to predict.
How should I interpret this?

Answered by adrianeboyd

Jan 13, 2023

This is probably because the spancat component depends on an earlier tok2vec or transformer component in the pipeline.

If you want a single independent component, you can use Language.replace_listeners to replace the listener with an internal copy of the tok2vec component.

View full answer

adrianeboyd · 2023-01-13T08:09:57Z

adrianeboyd
Jan 13, 2023

This is probably because the spancat component depends on an earlier tok2vec or transformer component in the pipeline.

If you want a single independent component, you can use Language.replace_listeners to replace the listener with an internal copy of the tok2vec component.

0 replies

ruben-dedoncker · 2023-01-30T14:33:45Z

ruben-dedoncker
Jan 30, 2023
Author

Hey that did the trick, thanks! I don't fully understand why though. In the trainig config I am unable to set the replace_listeners field under [components.spancat] because I've set factory=spancat so it uses the shared tok2vec component. But when I load the trained model and then replace the listener I get sensible results.
If you don't mind I would love to get a more detailed explanation as to why this is :)

1 reply

adrianeboyd Jan 30, 2023

In a config you can either have factory or source. For an existing pipeline with tok2vec+spancat, you can use nlp.replace_listeners and then save the modified pipeline with nlp.to_disk, or you can use source and replace_listeners with a new config and spacy assemble.

If you want to train the pipeline in the first place without a separate tok2vec, you can generate a config programmatically (see below) or edit the spacy init config config by hand, basically replacing [components.spancat.model.tok2vec] with the config block from [components.tok2vec.model].

You can see a config without a separate tok2vec like this:

import spacy
nlp = spacy.blank("en")
nlp.add_pipe("spancat")
print(nlp.config.to_str())

There are some differences in the defaults for spacy init config and nlp.add_pipe("spancat"), but you should be able to see what the config blocks look like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpanCategorizer confidence scores are equal for all spans. #12090

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

SpanCategorizer confidence scores are equal for all spans. #12090

ruben-dedoncker Jan 11, 2023

Replies: 2 comments · 1 reply

adrianeboyd Jan 13, 2023

ruben-dedoncker Jan 30, 2023 Author

adrianeboyd Jan 30, 2023

ruben-dedoncker
Jan 11, 2023

Replies: 2 comments 1 reply

adrianeboyd
Jan 13, 2023

ruben-dedoncker
Jan 30, 2023
Author