Removal of multihead attention from activation

Per @punkduckable , multihead attention should be implemented as a layer, not an activation function. However, the current implementation simply uses multihead attention as an activation function, which also disrupts overall structure within `MultiLayerPerceptron`.

multihead attention should be removed from activation, and probably implemented as a derived class of latent space.