Additional positional embeddings

@Adamits I'll implement but want to check with you if it's worthwhile since my domain is speech:

What are your thoughts on adding in new positional embeddings to Transformer models (particularly RoPE)? iirc we're using the standard cosine ones but they're a bit old fashioned nowadays. Know if there's arguments for or against?