-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Description
The numbers I'm getting (on my H100) are slightly lower than reported. Most blatantly, my cuBLAS (kernel 0) only reaches 518 TFLOPS. Any idea on what could be the issue? I'm guessing it's the toolkit version. I'm currently using 12.4.
$ make matmul && out/matmul
mkdir -p out
nvcc -std=c++17 -O3 -DNDEBUG -w --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -Xcompiler=-fPIE -Xcompiler=-Wno-psabi -Xcompiler=-fno-strict-aliasing -arch=sm_90a -lcublas -lcuda -lineinfo matmul.cu -o out/matmul
KERNEL 0
Average elapsed time: (0.002120) s, performance: ( 518.6) TFLOPS. size: (8192).
KERNEL 1
Average elapsed time: (0.037701) s, performance: ( 29.2) TFLOPS. size: (8192).
KERNEL 2
Average elapsed time: (0.005004) s, performance: ( 219.7) TFLOPS. size: (8192).
KERNEL 3
Load: 1100.182255, Compute: 727.985903, Store: 4681.573975, Datapoints: 4096
Average elapsed time: (0.002935) s, performance: ( 374.7) TFLOPS. size: (8192).
KERNEL 4
Average elapsed time: (0.003239) s, performance: ( 339.4) TFLOPS. size: (8192).
KERNEL 5
Average elapsed time: (0.002563) s, performance: ( 429.0) TFLOPS. size: (8192).
KERNEL 6
Average elapsed time: (0.002254) s, performance: ( 487.8) TFLOPS. size: (8192).
KERNEL 7
Average elapsed time: (0.002152) s, performance: ( 511.0) TFLOPS. size: (8192).
KERNEL 8
Average elapsed time: (0.002141) s, performance: ( 513.5) TFLOPS. size: (8192).
KERNEL 9
Average elapsed time: (0.002108) s, performance: ( 521.7) TFLOPS. size: (8192).
KERNEL 10
Average elapsed time: (0.002061) s, performance: ( 533.4) TFLOPS. size: (8192).
KERNEL 11
Average elapsed time: (0.002080) s, performance: ( 528.7) TFLOPS. size: (8192).
KERNEL 12
Average elapsed time: (0.002071) s, performance: ( 530.8) TFLOPS. size: (8192).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels