Skip to content

Long text clipped when disambiguated by BERT #145

@ahmadabousetta

Description

@ahmadabousetta

predictions.extend(prediction)

Ref line assumes the new batch is from a new sentence. Which is fine when trying to predict a list of short text sentences.
However, if we pass a single very long text, the dataloader will split the text into batches.
And since the input is only one sentence, only the predictions of the first batch will be returned. In my case, only 13309 out of 16949 tokens.

Fixing this issue should be done with care as this function is called also to predict a list of sentences.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions