We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, I already read the issue about the total error reported at the end and I understand that the errors are pretty low in that particular case. I ran the same configuration and got the same error, but when decrease the sparsity ration down to 50%, there is a very high error and big mismatches between cublas and flashllm:
First 10 Mismatches between Cublas and MySpMM: NOTHING PRINTED ******************************************Problem Size****************************************** M: 7168 N: 8 K: 7168 Pruning Rate: 90 SplitK: 7 CuBlas_SIMT -> Time/ms: 0.454 Performance/TFLOPs: 1.81 TotalError: 0.00 CuBlas_TC -> Time/ms: 0.224 Performance/TFLOPs: 3.67 TotalError: 0.00 FlashLLM_v1 -> Time/ms: 0.064 Performance/TFLOPs: 12.84 TotalError: 408.53 FlashLLM_v2 -> Time/ms: 0.064 Performance/TFLOPs: 12.85 TotalError: 408.53 ------ First 10 Mismatches between Cublas and MySpMM: NOTHING PRINTED ******************************************Problem Size****************************************** M: 7168 N: 8 K: 7168 Pruning Rate: 70 SplitK: 7 CuBlas_SIMT -> Time/ms: 0.454 Performance/TFLOPs: 1.81 TotalError: 0.00 CuBlas_TC -> Time/ms: 0.224 Performance/TFLOPs: 3.67 TotalError: 0.00 FlashLLM_v1 -> Time/ms: 0.136 Performance/TFLOPs: 6.05 TotalError: 1099.75 FlashLLM_v2 -> Time/ms: 0.136 Performance/TFLOPs: 6.05 TotalError: 1099.75 ------ First 10 Mismatches between Cublas and MySpMM: NOTHING PRINTED ******************************************Problem Size****************************************** M: 7168 N: 8 K: 7168 Pruning Rate: 60 SplitK: 7 CuBlas_SIMT -> Time/ms: 0.454 Performance/TFLOPs: 1.81 TotalError: 0.00 CuBlas_TC -> Time/ms: 0.224 Performance/TFLOPs: 3.67 TotalError: 0.00 FlashLLM_v1 -> Time/ms: 0.178 Performance/TFLOPs: 4.62 TotalError: 1699.12 FlashLLM_v2 -> Time/ms: 0.178 Performance/TFLOPs: 4.62 TotalError: 1699.12 ------ First 10 Mismatches between Cublas and MySpMM: (128,0) CuBlas=-340.000000 MySpMM=-290.750000 (128,1) CuBlas=-343.250000 MySpMM=-299.000000 (128,2) CuBlas=-363.500000 MySpMM=-299.500000 (128,3) CuBlas=-377.250000 MySpMM=-317.000000 (128,4) CuBlas=-342.250000 MySpMM=-297.250000 (128,5) CuBlas=-372.500000 MySpMM=-318.250000 (128,6) CuBlas=-333.000000 MySpMM=-288.250000 (128,7) CuBlas=-337.500000 MySpMM=-279.500000 (129,0) CuBlas=-330.750000 MySpMM=-271.000000 (129,1) CuBlas=-333.000000 MySpMM=-287.000000 ******************************************Problem Size****************************************** M: 7168 N: 8 K: 7168 Pruning Rate: 50 SplitK: 7 CuBlas_SIMT -> Time/ms: 0.454 Performance/TFLOPs: 1.81 TotalError: 0.00 CuBlas_TC -> Time/ms: 0.224 Performance/TFLOPs: 3.67 TotalError: 0.00 FlashLLM_v1 -> Time/ms: 0.223 Performance/TFLOPs: 3.69 TotalError: 64917.88 FlashLLM_v2 -> Time/ms: 0.223 Performance/TFLOPs: 3.69 TotalError: 72974.00
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hello,
I already read the issue about the total error reported at the end and I understand that the errors are pretty low in that particular case. I ran the same configuration and got the same error, but when decrease the sparsity ration down to 50%, there is a very high error and big mismatches between cublas and flashllm:
The text was updated successfully, but these errors were encountered: