-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Description
Thanks for the great blogpost and examples!
I create some small matrixes and see if it works. Actually I set m=128, n=128, k=64, however, the result divergence!
I observed that matmul_3 result and the cublas result is quite weird. The odd rows generated by matmul are consistent with the results of several rows of cublas.
if set the same random seed, maybe the result can be reproduced.
std::default_random_engine generator(55);
std::normal_distribution<float> distribution(0, 0.1);
The partial result are as follows:
==========================my
0.0479 0.0347 -0.0801 -0.0199 -0.0386 -0.1279 0.0019 -0.0522 0.0034 0.0070 0.0513 -0.0417 -0.0386 0.0012 0.0742 -0.0128
-0.0630 0.1357 0.0269 0.0183 -0.0957 0.1104 0.1147 0.0728 0.0417 0.0140 -0.1172 -0.0471 -0.0491 -0.0708 -0.1416 -0.0488
-0.0591 -0.0161 -0.1279 0.0020 0.1221 0.0222 -0.1338 0.0786 0.0305 0.1211 -0.0884 -0.0952 0.0942 -0.0047 0.0043 0.0309
0.0654 0.0159 -0.1328 -0.0072 -0.0205 0.1064 -0.0771 -0.0383 -0.0123 0.0292 -0.0688 -0.0923 0.0245 -0.1992 0.0315 -0.0210
0.0262 0.1514 -0.0625 0.0144 -0.0835 -0.0391 -0.0420 0.0603 0.0091 -0.0254 0.0491 0.0879 -0.0830 -0.0728 -0.1147 -0.0120
0.0977 0.1504 0.0742 -0.0913 -0.0094 0.0713 0.0596 0.0398 0.0262 0.0957 0.1816 0.0223 -0.0055 0.0273 -0.2100 -0.1562
-0.0413 -0.0025 -0.1260 -0.1348 -0.0640 0.0713 -0.1338 -0.0603 -0.0962 0.0400 0.1006 0.0010 0.0110 -0.0898 -0.0747 0.0388
-0.1250 0.0771 0.1357 -0.0236 0.0432 -0.0275 0.1855 0.0728 -0.0047 -0.0374 0.0820 0.0356 -0.0398 -0.0121 0.0442 -0.0017
-0.0659 -0.0344 -0.0354 0.1504 -0.0688 0.0742 -0.0334 0.0474 -0.1206 -0.0625 0.0532 -0.0039 -0.0189 0.0791 0.1426 -0.1494
0.0698 -0.0908 -0.0586 0.0728 0.1289 0.0583 -0.0327 0.0476 0.0474 0.0186 -0.1484 -0.0236 -0.0889 0.1021 0.1089 0.0038
-0.0781 -0.0410 0.0435 0.0278 0.0166 0.0200 -0.0300 -0.0737 0.0491 -0.0165 -0.0102 0.0801 0.0854 -0.0085 0.0601 0.0898
0.0630 0.0806 -0.1816 -0.0297 -0.1689 -0.0049 0.0728 -0.0605 0.1455 -0.1216 0.1025 -0.0057 -0.0103 -0.2441 -0.0425 0.1162
-0.0674 -0.0237 0.1182 0.1426 0.0140 0.0444 -0.0029 -0.0084 0.0825 -0.0417 -0.2207 -0.0057 -0.0190 0.1060 0.0260 -0.0306
-0.0325 -0.0503 0.0255 0.0160 0.0420 -0.0074 0.2236 0.0098 -0.0059 -0.0479 0.0850 0.0586 0.0194 -0.0605 -0.1089 -0.0039
-0.0260 0.1060 -0.0081 0.0471 0.0342 0.0430 0.0742 -0.0469 -0.0669 0.0232 -0.0664 -0.1758 -0.0193 0.0004 -0.0674 -0.0474
0.0234 0.0219 -0.0659 -0.0120 0.0503 0.0854 -0.3496 0.0330 0.0249 0.0454 -0.1260 -0.0869 -0.0306 0.0732 -0.0155 -0.1768
==========================ref
0.0479 0.0347 -0.0801 -0.0199 -0.0386 -0.1279 0.0019 -0.0522 0.0034 0.0070 0.0513 -0.0417 -0.0386 0.0012 0.0742 -0.0128
-0.0591 -0.0161 -0.1279 0.0020 0.1221 0.0222 -0.1338 0.0786 0.0305 0.1211 -0.0884 -0.0952 0.0942 -0.0047 0.0043 0.0309
0.0262 0.1514 -0.0625 0.0144 -0.0835 -0.0391 -0.0420 0.0603 0.0091 -0.0254 0.0491 0.0879 -0.0830 -0.0728 -0.1147 -0.0120
-0.0413 -0.0025 -0.1260 -0.1348 -0.0640 0.0713 -0.1338 -0.0603 -0.0962 0.0400 0.1006 0.0010 0.0110 -0.0898 -0.0747 0.0388
-0.0659 -0.0344 -0.0354 0.1504 -0.0688 0.0742 -0.0334 0.0474 -0.1206 -0.0625 0.0532 -0.0039 -0.0189 0.0791 0.1426 -0.1494
-0.0781 -0.0410 0.0435 0.0278 0.0166 0.0200 -0.0300 -0.0737 0.0491 -0.0165 -0.0102 0.0801 0.0854 -0.0085 0.0601 0.0898
-0.0674 -0.0237 0.1182 0.1426 0.0140 0.0444 -0.0029 -0.0084 0.0825 -0.0417 -0.2207 -0.0057 -0.0190 0.1060 0.0260 -0.0306
-0.0260 0.1060 -0.0081 0.0471 0.0342 0.0430 0.0742 -0.0469 -0.0669 0.0232 -0.0664 -0.1758 -0.0193 0.0004 -0.0674 -0.0474
0.0747 0.0217 0.0347 -0.0732 0.0320 -0.0674 -0.0703 -0.0182 -0.0459 0.0275 0.0298 0.1030 0.0732 -0.0143 -0.0723 0.0630
0.0771 -0.0503 0.0374 -0.1787 -0.0488 0.0244 -0.1016 -0.0608 -0.0063 -0.0140 -0.1079 0.0461 -0.0630 0.1250 0.0297 -0.0425
-0.0435 0.0603 0.1611 -0.0168 -0.0408 0.0183 -0.0030 -0.0388 0.0469 0.0118 -0.0845 0.2852 -0.0042 0.1074 0.0164 0.0933
0.0771 0.0723 -0.0232 -0.0703 0.0957 -0.0781 -0.0498 -0.0515 0.0117 0.0918 0.0289 0.0258 0.0159 0.0115 0.0033 -0.0718
-0.0035 -0.0075 0.0034 -0.0242 0.0302 -0.1104 0.0080 -0.0625 0.1602 -0.0515 -0.1060 -0.0500 -0.0742 -0.0361 0.1895 0.0356
0.1582 -0.0215 0.0410 -0.1719 0.1147 0.0007 -0.0913 -0.0300 -0.0221 0.0884 -0.1582 -0.1235 -0.0486 -0.0386 0.0864 0.0327
-0.1089 -0.1309 -0.1099 -0.1240 0.0220 -0.0089 -0.1592 0.0029 0.0317 0.1089 -0.0623 0.0449 0.0679 -0.0267 -0.0757 -0.0393
0.0840 -0.1016 -0.0056 -0.0796 0.2305 -0.0811 0.0330 -0.1123 0.1494 0.0245 -0.0723 -0.0272 0.0398 0.1089 0.1006 -0.0305
for ease of understanding, the C matrix is viewed as [M, N], and the result show above is C[:16, :16]. As metioned, C_ref[3, :] == C[6, :]
if I set m=n=k=512, the error gone. Any idea of why this is happening? My env
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Driver Version: 535.129.03 CUDA Version: 12.4
NVIDIA H800
g++ (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
NAME="CentOS Linux"
VERSION="7 (Core)"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels