Hi,
First of all, thanks for publicly sharing your code. I think, there is a bug when batch_size > 1 and when an intermediate instance in this batch gets truncated when its number of sub-tokens exceed 512 (while using a bert-base transformer model).
https://github.com/allenai/sequential_sentence_classification/blob/master/sequential_sentence_classification/model.py#L135
Here, instead of truncating the labels on the intermediate instance, the current code is truncating the labels at the end of the batch. This might result in label-mismatch especially for the following instances in that batch. Can you confirm this and if there is a bug indeed, I would like to volunteer to provide a fix for this.
Thanks,