MPS with cuQuantum #2168

MozammilQ · 2024-06-06T10:04:25Z

Summary

This PR aims to add a feature of doing matrix-product-state simulation on Nvidia GPUs with cutensor of cuQuantum.

Details and comments

Shows performance gains,

Got a ~x12 speedup on bigger circuits, but still I am not satisfied!

doichanj · 2024-06-10T08:32:32Z

From 3 days I have been fighting this,

Here, also the test failed here only ImportError: /tmp/tmp.pzwv2zFlTV/venv/lib/python3.12/site-packages/qiskit_aer/backends/controller_wrappers.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3AER21cutensor_csvd_wrapperER6matrixISt7complexIdEES4_RSt6vectorIdSaIdEES4_

After this PR I am missing Rust/cargo even more. This PR is 5% actual work and 95% fighting with this library error and that error.

Anyways, I really enjoyed this. I am looking forward to more contributions :) Thanks @doichanj :)

Because cutensor_csvd_wrapper is defined in namespace TensorNetwork calling cutensor_csvd_wrapper should be TensorNetwork::cutensor_csvd_wrapper

src/simulators/matrix_product_state/svd.cpp

doichanj · 2024-06-14T01:52:13Z

It fails running with MPS method on GPU with error message as following,
ERROR TensorNet::contractor : CUTENSORNET_STATUS_INVALID_VALUE

MozammilQ · 2024-06-14T01:54:54Z

Yes, I know I am actively working on this please give me some time and I will let you know when I am done

…

On Fri, 14 Jun, 2024, 7:22 am Jun Doi, ***@***.***> wrote: It fails running with MPS method on GPU with error message as following, ERROR TensorNet::contractor : CUTENSORNET_STATUS_INVALID_VALUE — Reply to this email directly, view it on GitHub <#2168 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANBTUULPA33H5DXTGKNAQ4LZHJEGFAVCNFSM6AAAAABI4MTL7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRXGA3DQMBQGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

test/terra/backends/simulator_test_case.py

MozammilQ · 2024-06-18T05:40:52Z

@doichanj , please see if this is good enough :)

and, extremely sorry, for the delay, doing any development in cloud VM, is not a good experience.

MozammilQ · 2024-06-18T13:58:20Z

I have absolutely no idea why macOS tests are failing.
for years all OS I have known is Linux, regarding MacOS, and Windows I only know their spellings :)

Randl · 2024-06-30T10:56:30Z

Looks like the problem is old cvxpy version, #2169 should fix it?

MozammilQ · 2024-07-03T23:29:26Z

I have got my hands on Nvidia 3060 12 GB.
I am actively working on the PR, to solve the performance issue :)

MozammilQ · 2024-07-04T05:40:06Z

I am going through more lectures of Tensor Networks.
I am working on it !
Trying to accelerate contract

@doichanj ,

docker container
Fedora:37
gcc: 12.3.1
python: 3.11.6
cuda-12-5
cuquantum-12
inside the venv: numpy version: 2.1.2
make a virutal env, and get into the venv.

when I do pip uninstall -y qiskit-aer && rm -frv ./qiskit_aer.egg-info/ && python setup.py clean && python setup.py develop
I get this error:

FAILED: qiskit_aer/backends/wrappers/CMakeFiles/controller_wrappers.dir/__/__/__/src/simulators/statevector/qv_avx2.cpp.o
/usr/local/cuda-12.5/bin/nvcc 
-forward-unknown-to-host-compiler 
-DAER_CUSTATEVEC 
-DAER_CUTENSORNET 
-DAER_THRUST_SUPPORTED=TRUE 
-DSPDLOG_COMPILED_LIB 
-DSPDLOG_FMT_EXTERNAL 
-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA 
-Dcontroller_wrappers_EXPORTS 
-I/usr/include/python3.11 
-I/root/.venvs/qiskit-aer-dev/lib64/python3.11/site-packages/pybind11/include
-I/home/qiskit-aer/src 
-isystem /root/.conan/data/nlohmann_json/3.1.1/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include 
-isystem /root/.conan/data/spdlog/1.9.2/_/_/package/0ddf38f27cb22953937f1e00a3576684c190cabc/include
-isystem /root/.conan/data/fmt/8.0.1/_/_/package/300cb578a6fb15627a390ac70b832e82fd040283/include 
-O3 
-DNDEBUG 
-std=c++14 
"--generate-code=arch=compute_86,code=[compute_86,sm_86]" 
-Xcompiler=-fPIC 
-Xcompiler=-fvisibility=hidden  
--compiler-options 
-fopenmp  
-gencode 
arch=compute_86,code=sm_86 
-DAER_THRUST_GPU 
-DAER_THRUST_CUDA 
-I/home/qiskit-aer/src 
-isystem /home/qiskit-aer/src/third-party/headers 
-use_fast_math 
--expt-extended-lambda  
--compiler-options 
-mfma 
--compiler-options 
-mavx2 
--compiler-options 
-mno-avx512bf16 
-MD 
-MT 
qiskit_aer/backends/wrappers/CMakeFiles/controller_wrappers.dir/__/__/__/src/simulators/statevector/qv_avx2.cpp.o 
-MF qiskit_aer/backends/wrappers/CMakeFiles/controller_wrappers.dir/__/__/__/src/simulators/statevector/qv_avx2.cpp.o.d 
-x cu 
-c /home/qiskit-aer/src/simulators/statevector/qv_avx2.cpp 
-o qiskit_aer/backends/wrappers/CMakeFiles/controller_wrappers.dir/__/__/__/src/simulators/statevector/qv_avx2.cpp.o

/usr/lib/gcc/x86_64-redhat-linux/12/include/avx512bf16vlintrin.h(53): error: identifier "__builtin_ia32_cvtne2ps2bf16_v16hi" is undefined
    return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16hi(__A, __B);

I cannot figure out what enables 'AVX512-BF16'?
I have Zen+ cpu, it only has avx2 , and, not even AVX512, let alone AVX512-BF16.
Why I get this error, is it really turning on 'AVX512-BF16' somewhere? where?
What do I do here?

@doichanj , please see if the PR is on track, waiting for your comments :)

MozammilQ · 2024-12-24T02:16:56Z

@doichanj , your comments would be highly valuable for me,
Please, review the code, if you have time :)

doichanj · 2024-12-26T08:10:11Z

I'm sorry but I'm on vacation, I will review in next year.
I just want to confirm that if this version can accelerate MPS simulation compared to running on CPU

MozammilQ · 2025-01-06T11:01:42Z

src/simulators/matrix_product_state/matrix_product_state_internal.cpp

+    // difference between the speed of CPU and GPU involved. Even if matrices
+    // are big, they are not big enough to make speed of PICe a bottleneck.
+    // In this particular case CPU was `Zen+` and GPU was `NVIDIA Ampere`.
+    if ((num_qubits_ > 13) && (MPS::mps_device_.compare("GPU") == 0) &&


Even though MPI acceleration is not available for MPS, still I compiled with AER_MPI=True.
Upon re-testing, the code against different CPUs:
Cascade Lake 4-cores + CUDA Arch 89, SapphireRapids 4-cores, zen+ 8-cores + CUDA Arch 86
In case of very low qubit count like 2,3 etc, GPU version is 1.5-2.5 seconds slower, maybe this is the time for GPU initialization.
The offloading of SVD to the GPU does accelerates the computation( in all cases! )
It will still be faster to offload to CUDA Arch 89 even with SapphireRapids CPU.

MozammilQ · 2025-01-06T11:03:36Z

src/simulators/matrix_product_state/matrix_product_state_tensor.hpp

+        // invoking needed GPU routines.
+        // If the matris is not big enough, the multiplication
+        // will be done on CPU using openblas zgemm_ routine.
+        if ((mat1_rows > 128) && (mat1_cols > 128) && (mat2_cols > 128)) {


Even though MPI acceleration is not available for MPS, still I compiled with AER_MPI=True.
Upon re-testing, the code against different CPUs:
Cascade Lake 4-cores + CUDA Arch 89, SapphireRapids 4-cores, zen+ 8-cores + CUDA Arch 86
In case of very low qubit count like 2,3 etc, GPU version is 1.5-2.5 seconds slower, maybe this is the time for GPU initialization.
Offloading the matrix multiplication to the GPU does accelerates the computation.
But, I suppose this should only work for combination of >=CUDA Arch 86 with any CPU whose 'phoronix-test-suite' benchmark score for test: 'pts/scimark2' '[Computational Test: Dense LU Matrix Factorization]' <=600Mflops.
I believe most of non-enterprise user will fall in this category.
No test has been done on the combination of CUDA Arch 89 with SapphireRapids, whose 'phoronix-test-suite' benchmark score for test: 'pts/scimark2' '[Computational Test: Dense LU Matrix Factorization]' 1300Mflops. Even, if it doesn't work for this combination, then all we have to do is increase the num_qubits to maybe 17 or 18, and increase the number of rows/cols to 256 or more. I have refrained myself from doing this modification because this will make MPS slower on low-end CPUs.

MozammilQ · 2025-01-06T11:04:43Z

@doichanj , Please have a look at my last two comments, and, review the code if you think so and have time :)

MozammilQ and others added 13 commits June 5, 2024 22:35

initial layout

563ae6e

refactor code

280f868

refactor code

ae44c69

refactor code

5b48265

refactor code

517a554

refactor code

7e40588

refactor code

ebf9ca0

refactor code

a422690

refactor code

80b59d5

refactor code

52f1ed4

refactor code

ed43e71

refactor code

649a5d7

Merge branch 'main' into mps-cutensor

83e4b5e

doichanj reviewed Jun 10, 2024

View reviewed changes

src/simulators/matrix_product_state/svd.cpp Outdated Show resolved Hide resolved

MozammilQ and others added 2 commits June 11, 2024 14:59

refactor code

c33571f

Merge branch 'main' into mps-cutensor

abc5552

MozammilQ and others added 9 commits June 14, 2024 14:02

refactor code

f0205e3

refactor code

629f65f

added release note

644a822

refactor code

e6f2288

Merge branch 'Qiskit:main' into mps-cutensor

42f983e

refactor code

c24b9e2

refactor code

34e9502

refactor code; included test

00f88e9

lint

454f8c0

MozammilQ commented Jun 18, 2024

View reviewed changes

test/terra/backends/simulator_test_case.py Show resolved Hide resolved

added suggestion

985c7f2

Merge branch 'main' into mps-cutensor

7ffab7d

MozammilQ and others added 8 commits August 30, 2024 15:35

Merge branch 'main' into mps-cutensor

6b0b41d

fixed a typo

34a5e75

refactor code

a1ae308

Merge branch 'Qiskit:main' into mps-cutensor

859e946

Merge branch 'Qiskit:main' into mps-cutensor

2ae116e

refactor code

ed9a907

Merge branch 'Qiskit:main' into mps-cutensor

a4dbd12

added cublas for contract

e1e80d1

MozammilQ requested a review from doichanj December 3, 2024 07:38

MozammilQ and others added 2 commits December 3, 2024 08:01

fixed typo

321230c

Merge branch 'Qiskit:main' into mps-cutensor

702ecd0

MozammilQ commented Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS with cuQuantum #2168

MPS with cuQuantum #2168

MozammilQ commented Jun 6, 2024 •

edited

Loading

doichanj commented Jun 10, 2024

doichanj commented Jun 14, 2024

MozammilQ commented Jun 14, 2024 via email

MozammilQ commented Jun 18, 2024

MozammilQ commented Jun 18, 2024

Randl commented Jun 30, 2024

MozammilQ commented Jul 3, 2024

MozammilQ commented Jul 4, 2024 •

edited

Loading

MozammilQ commented Dec 24, 2024

doichanj commented Dec 26, 2024

MozammilQ Jan 6, 2025 •

edited

Loading

MozammilQ Jan 6, 2025 •

edited

Loading

MozammilQ commented Jan 6, 2025

MPS with cuQuantum #2168

Are you sure you want to change the base?

MPS with cuQuantum #2168

Conversation

MozammilQ commented Jun 6, 2024 • edited Loading

Summary

Details and comments

doichanj commented Jun 10, 2024

doichanj commented Jun 14, 2024

MozammilQ commented Jun 14, 2024 via email

MozammilQ commented Jun 18, 2024

MozammilQ commented Jun 18, 2024

Randl commented Jun 30, 2024

MozammilQ commented Jul 3, 2024

MozammilQ commented Jul 4, 2024 • edited Loading

MozammilQ commented Dec 24, 2024

doichanj commented Dec 26, 2024

MozammilQ Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

MozammilQ Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

MozammilQ commented Jan 6, 2025

MozammilQ commented Jun 6, 2024 •

edited

Loading

MozammilQ commented Jul 4, 2024 •

edited

Loading

MozammilQ Jan 6, 2025 •

edited

Loading

MozammilQ Jan 6, 2025 •

edited

Loading