Possibility of bug with batch_size > 1 and long intermediate sentences with bert-base models

Hi, 

First of all, thanks for publicly sharing your code. I think, there is a bug when batch_size > 1 and when an intermediate instance in this batch gets truncated when its number of sub-tokens exceed 512 (while using a bert-base transformer model). 

https://github.com/allenai/sequential_sentence_classification/blob/master/sequential_sentence_classification/model.py#L135

Here, instead of truncating the labels on the intermediate instance, the current code is truncating the labels at the end of the batch. This might result in label-mismatch especially for the following instances in that batch. Can you confirm this and if there is a bug indeed, I would like to volunteer to provide a fix for this. 

Thanks,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility of bug with batch_size > 1 and long intermediate sentences with bert-base models #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possibility of bug with batch_size > 1 and long intermediate sentences with bert-base models #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions