Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long text clipped when disambiguated by BERT #145

Open
ahmadabousetta opened this issue Jun 6, 2024 · 0 comments
Open

Long text clipped when disambiguated by BERT #145

ahmadabousetta opened this issue Jun 6, 2024 · 0 comments

Comments

@ahmadabousetta
Copy link

predictions.extend(prediction)

Ref line assumes the new batch is from a new sentence. Which is fine when trying to predict a list of short text sentences.
However, if we pass a single very long text, the dataloader will split the text into batches.
And since the input is only one sentence, only the predictions of the first batch will be returned. In my case, only 13309 out of 16949 tokens.

Fixing this issue should be done with care as this function is called also to predict a list of sentences.

@ahmadabousetta ahmadabousetta changed the title long text clipped when disambiguated by BERT Long text clipped when disambiguated by BERT Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant