Skip to content

[question] Result Divergence when matrixes are small #4

@HarryWu99

Description

@HarryWu99

Thanks for the great blogpost and examples!

I create some small matrixes and see if it works. Actually I set m=128, n=128, k=64, however, the result divergence!

I observed that matmul_3 result and the cublas result is quite weird. The odd rows generated by matmul are consistent with the results of several rows of cublas.

if set the same random seed, maybe the result can be reproduced.

std::default_random_engine generator(55);
std::normal_distribution<float> distribution(0, 0.1);

The partial result are as follows:

==========================my
0.0479	0.0347	-0.0801	-0.0199	-0.0386	-0.1279	0.0019	-0.0522	0.0034	0.0070	0.0513	-0.0417	-0.0386	0.0012	0.0742	-0.0128	
-0.0630	0.1357	0.0269	0.0183	-0.0957	0.1104	0.1147	0.0728	0.0417	0.0140	-0.1172	-0.0471	-0.0491	-0.0708	-0.1416	-0.0488	
-0.0591	-0.0161	-0.1279	0.0020	0.1221	0.0222	-0.1338	0.0786	0.0305	0.1211	-0.0884	-0.0952	0.0942	-0.0047	0.0043	0.0309	
0.0654	0.0159	-0.1328	-0.0072	-0.0205	0.1064	-0.0771	-0.0383	-0.0123	0.0292	-0.0688	-0.0923	0.0245	-0.1992	0.0315	-0.0210	
0.0262	0.1514	-0.0625	0.0144	-0.0835	-0.0391	-0.0420	0.0603	0.0091	-0.0254	0.0491	0.0879	-0.0830	-0.0728	-0.1147	-0.0120	
0.0977	0.1504	0.0742	-0.0913	-0.0094	0.0713	0.0596	0.0398	0.0262	0.0957	0.1816	0.0223	-0.0055	0.0273	-0.2100	-0.1562	
-0.0413	-0.0025	-0.1260	-0.1348	-0.0640	0.0713	-0.1338	-0.0603	-0.0962	0.0400	0.1006	0.0010	0.0110	-0.0898	-0.0747	0.0388	
-0.1250	0.0771	0.1357	-0.0236	0.0432	-0.0275	0.1855	0.0728	-0.0047	-0.0374	0.0820	0.0356	-0.0398	-0.0121	0.0442	-0.0017	
-0.0659	-0.0344	-0.0354	0.1504	-0.0688	0.0742	-0.0334	0.0474	-0.1206	-0.0625	0.0532	-0.0039	-0.0189	0.0791	0.1426	-0.1494	
0.0698	-0.0908	-0.0586	0.0728	0.1289	0.0583	-0.0327	0.0476	0.0474	0.0186	-0.1484	-0.0236	-0.0889	0.1021	0.1089	0.0038	
-0.0781	-0.0410	0.0435	0.0278	0.0166	0.0200	-0.0300	-0.0737	0.0491	-0.0165	-0.0102	0.0801	0.0854	-0.0085	0.0601	0.0898	
0.0630	0.0806	-0.1816	-0.0297	-0.1689	-0.0049	0.0728	-0.0605	0.1455	-0.1216	0.1025	-0.0057	-0.0103	-0.2441	-0.0425	0.1162	
-0.0674	-0.0237	0.1182	0.1426	0.0140	0.0444	-0.0029	-0.0084	0.0825	-0.0417	-0.2207	-0.0057	-0.0190	0.1060	0.0260	-0.0306	
-0.0325	-0.0503	0.0255	0.0160	0.0420	-0.0074	0.2236	0.0098	-0.0059	-0.0479	0.0850	0.0586	0.0194	-0.0605	-0.1089	-0.0039	
-0.0260	0.1060	-0.0081	0.0471	0.0342	0.0430	0.0742	-0.0469	-0.0669	0.0232	-0.0664	-0.1758	-0.0193	0.0004	-0.0674	-0.0474	
0.0234	0.0219	-0.0659	-0.0120	0.0503	0.0854	-0.3496	0.0330	0.0249	0.0454	-0.1260	-0.0869	-0.0306	0.0732	-0.0155	-0.1768	
==========================ref
0.0479	0.0347	-0.0801	-0.0199	-0.0386	-0.1279	0.0019	-0.0522	0.0034	0.0070	0.0513	-0.0417	-0.0386	0.0012	0.0742	-0.0128	
-0.0591	-0.0161	-0.1279	0.0020	0.1221	0.0222	-0.1338	0.0786	0.0305	0.1211	-0.0884	-0.0952	0.0942	-0.0047	0.0043	0.0309	
0.0262	0.1514	-0.0625	0.0144	-0.0835	-0.0391	-0.0420	0.0603	0.0091	-0.0254	0.0491	0.0879	-0.0830	-0.0728	-0.1147	-0.0120	
-0.0413	-0.0025	-0.1260	-0.1348	-0.0640	0.0713	-0.1338	-0.0603	-0.0962	0.0400	0.1006	0.0010	0.0110	-0.0898	-0.0747	0.0388	
-0.0659	-0.0344	-0.0354	0.1504	-0.0688	0.0742	-0.0334	0.0474	-0.1206	-0.0625	0.0532	-0.0039	-0.0189	0.0791	0.1426	-0.1494	
-0.0781	-0.0410	0.0435	0.0278	0.0166	0.0200	-0.0300	-0.0737	0.0491	-0.0165	-0.0102	0.0801	0.0854	-0.0085	0.0601	0.0898	
-0.0674	-0.0237	0.1182	0.1426	0.0140	0.0444	-0.0029	-0.0084	0.0825	-0.0417	-0.2207	-0.0057	-0.0190	0.1060	0.0260	-0.0306	
-0.0260	0.1060	-0.0081	0.0471	0.0342	0.0430	0.0742	-0.0469	-0.0669	0.0232	-0.0664	-0.1758	-0.0193	0.0004	-0.0674	-0.0474	
0.0747	0.0217	0.0347	-0.0732	0.0320	-0.0674	-0.0703	-0.0182	-0.0459	0.0275	0.0298	0.1030	0.0732	-0.0143	-0.0723	0.0630	
0.0771	-0.0503	0.0374	-0.1787	-0.0488	0.0244	-0.1016	-0.0608	-0.0063	-0.0140	-0.1079	0.0461	-0.0630	0.1250	0.0297	-0.0425	
-0.0435	0.0603	0.1611	-0.0168	-0.0408	0.0183	-0.0030	-0.0388	0.0469	0.0118	-0.0845	0.2852	-0.0042	0.1074	0.0164	0.0933	
0.0771	0.0723	-0.0232	-0.0703	0.0957	-0.0781	-0.0498	-0.0515	0.0117	0.0918	0.0289	0.0258	0.0159	0.0115	0.0033	-0.0718	
-0.0035	-0.0075	0.0034	-0.0242	0.0302	-0.1104	0.0080	-0.0625	0.1602	-0.0515	-0.1060	-0.0500	-0.0742	-0.0361	0.1895	0.0356	
0.1582	-0.0215	0.0410	-0.1719	0.1147	0.0007	-0.0913	-0.0300	-0.0221	0.0884	-0.1582	-0.1235	-0.0486	-0.0386	0.0864	0.0327	
-0.1089	-0.1309	-0.1099	-0.1240	0.0220	-0.0089	-0.1592	0.0029	0.0317	0.1089	-0.0623	0.0449	0.0679	-0.0267	-0.0757	-0.0393	
0.0840	-0.1016	-0.0056	-0.0796	0.2305	-0.0811	0.0330	-0.1123	0.1494	0.0245	-0.0723	-0.0272	0.0398	0.1089	0.1006	-0.0305	

for ease of understanding, the C matrix is viewed as [M, N], and the result show above is C[:16, :16]. As metioned, C_ref[3, :] == C[6, :]

if I set m=n=k=512, the error gone. Any idea of why this is happening? My env

Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Driver Version: 535.129.03   CUDA Version: 12.4
NVIDIA H800

g++ (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)

NAME="CentOS Linux"
VERSION="7 (Core)"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions