The implementation of paper submitted ACS.
The transformer-based models have remarkably improved the performance of many natural language processing tasks in recent years. However, their input length is a limitation of these models, mainly due to the computational complexity of the attention. Inspired by the human tendency to ignore many words during a reading comprehension task, we experiment with the effect of removing tokens from a sequence on prediction performance for sentiment analysis. In this preliminarypaper, we analyze a length reducing system based on layer-level attention scores to see what percentage of input length is required to obtain a reasonable amount of accuracy in a sentiment analysis task. We show that the filtering system based on BERT allows us to reduce sequences lengths of up to 99% in a sentiment analysis task while still obtaining 70% accuracy.