Hi
cuet.TensorProduct flattens, squeezes, and splits the descriptor so that it can use the TensorProductUniform4x1d. In fact, GPUs have Tensor core. Will processing data into 1d vectors reduce computational complexity, or may I have misunderstood? Please advise.
Hi
cuet.TensorProduct flattens, squeezes, and splits the descriptor so that it can use the TensorProductUniform4x1d. In fact, GPUs have Tensor core. Will processing data into
1dvectors reduce computational complexity, or may I have misunderstood? Please advise.