Inconsistent gdr read latency #243

anaanimous · 2023-01-18T16:35:48Z

I have a memory allocator based on gdr which initially allocates a large chunk of gdr memory (e.g. 16 MB) and then allocates pieces of this chunk to the subsequent memory requests. During performance benchmarking, I noticed that the read latency of the same memory size fluctuates quite significantly and I can't understand why. For example, if I allocate 3KB memory read 100 times and do the same thing again and again, the average read time fluctuates between 4.5 us and 70 us (i.e. 4.5 -> 70 -> 4.5 -> 70 ...) even though the same piece of memory is allocated for every 100 reads.

Here are some details regarding my settings:

I have set CU_POINTER_ATTRIBUTE_SYNC_MEMOPS to 1 for the entire chunk using cuPointerSetAttribute.
The api reports "using SSE4_1 implementation of gdr_copy_from_bar" for read operation.
The data is read into page-locked host memory allocated using cudaMallocHost.
Both the gdr-mapped source pointer and the destination pointer are 128-bit aligned.
The write latency is quite consistent.

pakmarkthub · 2023-01-19T02:31:48Z

Hi @anaanimous,

CPU and GPU clocks are usually the main cause (but not always) of performance fluctuation. Can you try the items below and rerun your test again?

Fix the CPU clock or at least set your power governance to "performance" sudo cpupower frequency-set -g performance.
Please also set the GPU clocks to max.

# To view the max clock values of GPU 0
$ nvidia-smi -i 0 -q
...
    Max Clocks
        Graphics                          : 1410 MHz
        SM                                : 1410 MHz
        Memory                            : 1593 MHz
        Video                             : 1290 MHz
...

# To set the clocks of GPU 0 to max
$ sudo nvidia-smi -i 0 -ac 1539,1410

anaanimous · 2023-01-19T15:10:09Z

Thank you for the quick response.

I have been running the benchmark on an AWS instance. However, today after running the same program on a local server, the performance has been stable. I don't know what's causing the fluctuation on AWS. It may be due to frequency scaling but I doubt it. Here is why:

I have a benchmark program where I measure the read latency for different sizes (1, 2, 4, ..., 1MB), similar to the copylat program, except that I use my allocator and its APIs to allocate memory and perform the reading. If I run this program on a local server the performance nicely matches that of the copylat. But on AWS the read latency for sizes above 512 bytes suddenly increases significantly (e.g. the latency of reading 512 bytes goes from 1.5 us to 12 us). But strangely enough, if I only skip the one-byte read (i.e. if I perform the test for 2, 4, ..., 1MB instead of 1, 2,... 1MB) the numbers match the copylat output.

pakmarkthub · 2023-01-20T01:58:12Z

Let's split into two topics here. The first one is the performance fluctuation, which seems to be resolved now. Depending on how your instance is allocated, I guess that you might share the host with other instances. I cannot say much about the performance predictability if you are not in full control of the entire system. There are so many external factors that can affect the performance.

The second topic is about the reading latency jumps to 12 us when reading 512 bytes. Can you share the code? I will try to reproduce this behavior on our system.

anaanimous changed the title ~~inconsistent gdr read latency~~ Inconsistent gdr read latency Jan 18, 2023

drossetti added the question label Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent gdr read latency #243

Inconsistent gdr read latency #243

anaanimous commented Jan 18, 2023

pakmarkthub commented Jan 19, 2023

anaanimous commented Jan 19, 2023

pakmarkthub commented Jan 20, 2023

Inconsistent gdr read latency #243

Inconsistent gdr read latency #243

Comments

anaanimous commented Jan 18, 2023

pakmarkthub commented Jan 19, 2023

anaanimous commented Jan 19, 2023

pakmarkthub commented Jan 20, 2023