You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed that some overlapping collectives occur because the timestamp of the GPU operation is overwritten with the timestamp of the runtime operation. The line in cause is in trace_linker.py, in the find_parent_cpu_op method:
kineto_gpu_op.timestamp = kineto_runtime_op.timestamp
Is this the intended behavior?
I dumped some relevant info from the device trace and from the resulting linked trace, where not only an overlap is noticeable, but also a mismatch between the starting/ending timestamps:
Kineto:
start:23:25:28.743436 duration:8932.93 end:23:25:28.752369 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.755013 duration:31272.55 end:23:25:28.786285 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.786286 duration:22644.126 end:23:25:28.808930 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.809750 duration:42481.079 end:23:25:28.852232 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.871565 duration:44283.024 end:23:25:28.915848 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Linked:
start: 23:25:28.743424 duration: 8932.93 \ end:23:25:28.752357 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.754414 duration: 31272.55 \ end:23:25:28.785686 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.759648 duration: 22644.126 \ end:23:25:28.782292 overlapped:True ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.807910 duration: 42481.079 \ end:23:25:28.850391 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.858260 duration: 44283.024 \ end:23:25:28.902543 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Steps to Reproduce
Create a chakra trace and jsonize it for easier inspection
v0.0.4 - main branch
Expected Behavior
Fixed the suspected issue on our side and processed the same host and device traces from above and no overlaps occurred and timestamps are matching:
Kineto:
start:23:25:28.675025 duration:47.807 end:23:25:28.675073 overlapped:False ncclDevKernel_Broadcast_RING_LL
start:23:25:28.676516 duration:5.024 end:23:25:28.676521 overlapped:False ncclDevKernel_Broadcast_RING_LL
start:23:25:28.743436 duration:8932.93 end:23:25:28.752369 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.755013 duration:31272.55 end:23:25:28.786285 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.786286 duration:22644.126 end:23:25:28.808930 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.809750 duration:42481.079 end:23:25:28.852232 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.871565 duration:44283.024 end:23:25:28.915848 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Linked:
start: 23:25:28.743436 duration: 8932.93 \ end:23:25:28.752369 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.755013 duration: 31272.55 \ end:23:25:28.786285 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.786286 duration: 22644.126 \ end:23:25:28.808930 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.809750 duration: 42481.079 \ end:23:25:28.852232 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.871565 duration: 44283.024 \ end:23:25:28.915848 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Screenshots
N/A
The text was updated successfully, but these errors were encountered:
Describe the Bug
I have noticed that some overlapping collectives occur because the timestamp of the GPU operation is overwritten with the timestamp of the runtime operation. The line in cause is in trace_linker.py, in the find_parent_cpu_op method:
kineto_gpu_op.timestamp = kineto_runtime_op.timestamp
Is this the intended behavior?
I dumped some relevant info from the device trace and from the resulting linked trace, where not only an overlap is noticeable, but also a mismatch between the starting/ending timestamps:
Kineto:
start:23:25:28.743436 duration:8932.93 end:23:25:28.752369 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.755013 duration:31272.55 end:23:25:28.786285 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.786286 duration:22644.126 end:23:25:28.808930 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.809750 duration:42481.079 end:23:25:28.852232 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.871565 duration:44283.024 end:23:25:28.915848 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Linked:
start: 23:25:28.743424 duration: 8932.93 \ end:23:25:28.752357 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.754414 duration: 31272.55 \ end:23:25:28.785686 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.759648 duration: 22644.126 \ end:23:25:28.782292 overlapped:True ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.807910 duration: 42481.079 \ end:23:25:28.850391 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.858260 duration: 44283.024 \ end:23:25:28.902543 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Steps to Reproduce
Create a chakra trace and jsonize it for easier inspection
v0.0.4 - main branch
Expected Behavior
Fixed the suspected issue on our side and processed the same host and device traces from above and no overlaps occurred and timestamps are matching:
Kineto:
start:23:25:28.675025 duration:47.807 end:23:25:28.675073 overlapped:False ncclDevKernel_Broadcast_RING_LL
start:23:25:28.676516 duration:5.024 end:23:25:28.676521 overlapped:False ncclDevKernel_Broadcast_RING_LL
start:23:25:28.743436 duration:8932.93 end:23:25:28.752369 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.755013 duration:31272.55 end:23:25:28.786285 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.786286 duration:22644.126 end:23:25:28.808930 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.809750 duration:42481.079 end:23:25:28.852232 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start:23:25:28.871565 duration:44283.024 end:23:25:28.915848 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Linked:
start: 23:25:28.743436 duration: 8932.93 \ end:23:25:28.752369 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.755013 duration: 31272.55 \ end:23:25:28.786285 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.786286 duration: 22644.126 \ end:23:25:28.808930 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.809750 duration: 42481.079 \ end:23:25:28.852232 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
start: 23:25:28.871565 duration: 44283.024 \ end:23:25:28.915848 overlapped:False ncclDevKernel_AllReduce_Sum_f32_RING_LL
Screenshots
The text was updated successfully, but these errors were encountered: