Right now, our dominant paradigm is autoregressive prediction. While this is SOTA ,it doesn't allow experimentation with other sequence-to-sequence tasks. Technically, we can define an arbitrary sequence length >>>> than any input or output and train a model to project an input to this length and then decode over it for non-autoregressive prediction. Example algorithms would be: Listen Attend Spell or CTC.
Right now, our dominant paradigm is autoregressive prediction. While this is SOTA ,it doesn't allow experimentation with other sequence-to-sequence tasks. Technically, we can define an arbitrary sequence length >>>> than any input or output and train a model to project an input to this length and then decode over it for non-autoregressive prediction. Example algorithms would be: Listen Attend Spell or CTC.