Transformers paper: Attention Is All You Need It does not imply any Recurrent Networks (GRU, LSTM, etc.). 1st Task: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2nd Task:...