Long text clipped when disambiguated by BERT #145

ahmadabousetta · 2024-06-06T15:26:14Z

camel_tools/camel_tools/disambig/bert/unfactored.py

Line 177 in b496501

predictions.extend(prediction)

Ref line assumes the new batch is from a new sentence. Which is fine when trying to predict a list of short text sentences.
However, if we pass a single very long text, the dataloader will split the text into batches.
And since the input is only one sentence, only the predictions of the first batch will be returned. In my case, only 13309 out of 16949 tokens.

Fixing this issue should be done with care as this function is called also to predict a list of sentences.

ahmadabousetta changed the title ~~long text clipped when disambiguated by BERT~~ Long text clipped when disambiguated by BERT Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long text clipped when disambiguated by BERT #145

Long text clipped when disambiguated by BERT #145

ahmadabousetta commented Jun 6, 2024

Long text clipped when disambiguated by BERT #145

Long text clipped when disambiguated by BERT #145

Comments

ahmadabousetta commented Jun 6, 2024