Does TensorProductUniform4x1d not utilize the Tensor core of GPUs

Hi

cuet.TensorProduct flattens, squeezes, and splits the descriptor so that it can use the TensorProductUniform4x1d. In fact, GPUs have Tensor core. Will processing data into `1d` vectors reduce computational complexity, or may I have misunderstood? Please advise.