Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some unit tests in sanity test fail when running in GPU Exclusive mode #231

Open
pakmarkthub opened this issue Jul 22, 2022 · 0 comments
Open
Labels

Comments

@pakmarkthub
Copy link
Collaborator

$ sanity -v
Running suite(s): Sanity
&&&& RUNNING basic_cumemalloc
buffer size: 327680
&&&& PASSED basic_cumemalloc
&&&& RUNNING basic_with_tokens
buffer size: 327680
&&&& PASSED basic_with_tokens
&&&& RUNNING basic_unaligned_mapping
First allocation: d_fa=0x147935a00000, size=4
Second allocation: d_A=0x147935a20200, size=65540, GPU-page-boundary 0x147935a20000
d_A is unaligned
Try mapping d_A as is.
Mapping d_A failed as expected.
Align d_A and try mapping it again.
Pin and map aligned address: d_aligned_A=0x147935a30000, offset=65024, size=516
&&&& PASSED basic_unaligned_mapping
&&&& RUNNING basic_child_thread_pins_buffer_cumemalloc
spawning single child thread
pinning
mapping
unmapping
unpinning
spawning two children threads, splitting setup and teardown
pinning
mapping
unmapping
unpinning
spawning two children threads, concurrently pinning and mapping the same buffer
pinning
mapping
unmapping
unpinning
pinning
mapping
unmapping
unpinning
spawning cleanup child thread
&&&& PASSED basic_child_thread_pins_buffer_cumemalloc
&&&& RUNNING basic_vmmalloc
buffer size: 327680
&&&& PASSED basic_vmmalloc
&&&& RUNNING basic_child_thread_pins_buffer_vmmalloc
spawning single child thread
pinning
mapping
unmapping
unpinning
spawning two children threads, splitting setup and teardown
pinning
mapping
unmapping
unpinning
spawning two children threads, concurrently pinning and mapping the same buffer
pinning
mapping
unmapping
unpinning
pinning
mapping
unmapping
unpinning
spawning cleanup child thread
&&&& PASSED basic_child_thread_pins_buffer_vmmalloc
&&&& RUNNING data_validation_cumemalloc
buffer size: 327680
off: 0
check 1: MMIO CPU initialization + read back via cuMemcpy D->H
check 2: gdr_copy_to_bar() + read back via cuMemcpy D->H
check 3: gdr_copy_to_bar() + read back via gdr_copy_from_bar()
check 4: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 5 dwords offset
check 5: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 11 bytes offset
warning: buffer size 327669 is not dword aligned, ignoring trailing bytes
unmapping
unpinning
&&&& PASSED data_validation_cumemalloc
&&&& RUNNING data_validation_vmmalloc
buffer size: 327680
off: 0
check 1: MMIO CPU initialization + read back via cuMemcpy D->H
check 2: gdr_copy_to_bar() + read back via cuMemcpy D->H
check 3: gdr_copy_to_bar() + read back via gdr_copy_from_bar()
check 4: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 5 dwords offset
check 5: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + 11 bytes offset
warning: buffer size 327669 is not dword aligned, ignoring trailing bytes
unmapping
unpinning
&&&& PASSED data_validation_vmmalloc
&&&& RUNNING invalidation_access_after_gdr_close_cumemalloc
Mapping bar1
Writing 284 into buf_ptr[0]
Calling gdr_close
Trying to read buf_ptr[0] after gdr_close
Get signal 7 as expected
&&&& PASSED invalidation_access_after_gdr_close_cumemalloc
&&&& RUNNING invalidation_access_after_free_cumemalloc
Mapping bar1
Writing 284 into buf_ptr[0]
Calling gpuMemFree
Trying to read buf_ptr[0] after gpuMemFree
Get signal 7 as expected
&&&& PASSED invalidation_access_after_free_cumemalloc
&&&& RUNNING invalidation_two_mappings_cumemalloc
Mapping bar1
Writing data to both mappings 596 and 597 respectively
Validating that we can read the data back
gpuMemFree and thus destroying the first mapping
Trying to read and validate the data from the second mapping after the first mapping has been destroyed
&&&& PASSED invalidation_two_mappings_cumemalloc
&&&& RUNNING invalidation_fork_access_after_free_cumemalloc
parent: Start
child: Start
child: waiting for cont signal from parent
parent: writing buf_ptr[0] with 596
parent: read buf_ptr[0] before gpuMemFree get 596
parent: calling gpuMemFree
child: receive cont signal 1 from parent
parent: waiting for child write signal
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
Assertion "(read(read_fd, &child_data, sizeof(int))) == (sizeof(int))" failed at sanity.cpp:946
&&&& FAILED invalidation_fork_access_after_free_cumemalloc
&&&& RUNNING invalidation_fork_after_gdr_map_cumemalloc
parent: Start
parent: writing buf_ptr[0] with 596
parent: trying to read buf_ptr[0]
parent: read buf_ptr[0] get 596
parent: signaling child
parent: waiting for child to exit
child: Start
child: waiting for cont signal from parent
child: receive cont signal 1 from parent
child: trying to read buf_ptr[0]
Get signal 11 as expected
parent: trying to read buf_ptr[0] after child exits
parent: read buf_ptr[0] after child exits get 596
&&&& PASSED invalidation_fork_after_gdr_map_cumemalloc
&&&& RUNNING invalidation_fork_child_gdr_map_parent_cumemalloc
parent: Start
child: Start
child: attempting to gdr_map parent's pinned GPU memory
child: cannot do gdr_map as expected
&&&& PASSED invalidation_fork_child_gdr_map_parent_cumemalloc
&&&& RUNNING invalidation_fork_map_and_free_cumemalloc
parent: Start
child: Start
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
parent: writing buf_ptr[0] with 596
parent: waiting for signal from child
Assertion "(read(read_fd, &cont, sizeof(int))) == (sizeof(int))" failed at sanity.cpp:1346
&&&& FAILED invalidation_fork_map_and_free_cumemalloc
&&&& RUNNING invalidation_unix_sock_shared_fd_gdr_pin_buffer_cumemalloc
parent: Start
child: Start
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
parent: Calling gdr_open
parent: Extracted fd from gdr_t got fd 4
parent: Sending fd to child via unix socket
sendmsg failed with Connection refusedAssertion "sendfd(pair[1], fd) >= 0" failed at sanity.cpp:1463
&&&& FAILED invalidation_unix_sock_shared_fd_gdr_pin_buffer_cumemalloc
&&&& RUNNING invalidation_unix_sock_shared_fd_gdr_map_cumemalloc
parent: Start
child: Start
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
parent: Calling gdr_open
parent: Calling gdr_pin_buffer
parent: Extracted fd from gdr_t got fd 8
parent: Sending fd to child via unix socket
sendmsg failed with Connection refusedAssertion "sendfd(pair[1], fd) >= 0" failed at sanity.cpp:1608
&&&& FAILED invalidation_unix_sock_shared_fd_gdr_map_cumemalloc
&&&& RUNNING invalidation_fork_child_gdr_pin_parent_with_tokens
parent: Start
child: Start
parent: CUDA generated tokens.p2pToken 0, tokens.vaSpaceToken 65024
child: Received from parent tokens.p2pToken 0, tokens.vaSpaceToken 65024
&&&& PASSED invalidation_fork_child_gdr_pin_parent_with_tokens
&&&& RUNNING invalidation_access_after_gdr_close_vmmalloc
Mapping bar1
Writing 678 into buf_ptr[0]
Calling gdr_close
Trying to read buf_ptr[0] after gdr_close
Get signal 7 as expected
&&&& PASSED invalidation_access_after_gdr_close_vmmalloc
&&&& RUNNING invalidation_access_after_free_vmmalloc
Mapping bar1
Writing 678 into buf_ptr[0]
Calling gpuMemFree
Trying to read buf_ptr[0] after gpuMemFree
Get signal 7 as expected
&&&& PASSED invalidation_access_after_free_vmmalloc
&&&& RUNNING invalidation_two_mappings_vmmalloc
Mapping bar1
Writing data to both mappings 678 and 679 respectively
Validating that we can read the data back
gpuMemFree and thus destroying the first mapping
Trying to read and validate the data from the second mapping after the first mapping has been destroyed
&&&& PASSED invalidation_two_mappings_vmmalloc
&&&& RUNNING invalidation_fork_access_after_free_vmmalloc
parent: Start
child: Start
child: waiting for cont signal from parent
parent: writing buf_ptr[0] with 678
parent: read buf_ptr[0] before gpuMemFree get 678
parent: calling gpuMemFree
child: receive cont signal 1 from parent
parent: waiting for child write signal
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
Assertion "(read(read_fd, &child_data, sizeof(int))) == (sizeof(int))" failed at sanity.cpp:946
&&&& FAILED invalidation_fork_access_after_free_vmmalloc
&&&& RUNNING invalidation_fork_after_gdr_map_vmmalloc
parent: Start
parent: writing buf_ptr[0] with 678
parent: trying to read buf_ptr[0]
parent: read buf_ptr[0] get 678
parent: signaling child
parent: waiting for child to exit
child: Start
child: waiting for cont signal from parent
child: receive cont signal 1 from parent
child: trying to read buf_ptr[0]
Get signal 11 as expected
parent: trying to read buf_ptr[0] after child exits
parent: read buf_ptr[0] after child exits get 678
&&&& PASSED invalidation_fork_after_gdr_map_vmmalloc
&&&& RUNNING invalidation_fork_child_gdr_map_parent_vmmalloc
parent: Start
child: Start
child: attempting to gdr_map parent's pinned GPU memory
child: cannot do gdr_map as expected
&&&& PASSED invalidation_fork_child_gdr_map_parent_vmmalloc
&&&& RUNNING invalidation_fork_map_and_free_vmmalloc
parent: Start
child: Start
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
parent: writing buf_ptr[0] with 662
parent: waiting for signal from child
Assertion "(read(read_fd, &cont, sizeof(int))) == (sizeof(int))" failed at sanity.cpp:1346
&&&& FAILED invalidation_fork_map_and_free_vmmalloc
&&&& RUNNING invalidation_unix_sock_shared_fd_gdr_pin_buffer_vmmalloc
parent: Start
child: Start
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
parent: Calling gdr_open
parent: Extracted fd from gdr_t got fd 4
parent: Sending fd to child via unix socket
sendmsg failed with Connection refusedAssertion "sendfd(pair[1], fd) >= 0" failed at sanity.cpp:1463
&&&& FAILED invalidation_unix_sock_shared_fd_gdr_pin_buffer_vmmalloc
&&&& RUNNING invalidation_unix_sock_shared_fd_gdr_map_vmmalloc
parent: Start
child: Start
CUDA error: CUDA_ERROR_INVALID_DEVICE
Assertion "CUDA_SUCCESS == result" failed at sanity.cpp:68
parent: Calling gdr_open
parent: Calling gdr_pin_buffer
parent: Extracted fd from gdr_t got fd 8
parent: Sending fd to child via unix socket
sendmsg failed with Connection refusedAssertion "sendfd(pair[1], fd) >= 0" failed at sanity.cpp:1608
&&&& FAILED invalidation_unix_sock_shared_fd_gdr_map_vmmalloc
70%: Checks: 27, Failures: 8, Errors: 0
sanity.cpp:975:F:Invalidation:invalidation_fork_access_after_free_cumemalloc:0: Failed
sanity.cpp:1371:F:Invalidation:invalidation_fork_map_and_free_cumemalloc:0: Failed
sanity.cpp:1478:F:Invalidation:invalidation_unix_sock_shared_fd_gdr_pin_buffer_cumemalloc:0: Failed
sanity.cpp:1629:F:Invalidation:invalidation_unix_sock_shared_fd_gdr_map_cumemalloc:0: Failed
sanity.cpp:981:F:Invalidation:invalidation_fork_access_after_free_vmmalloc:0: Failed
sanity.cpp:1377:F:Invalidation:invalidation_fork_map_and_free_vmmalloc:0: Failed
sanity.cpp:1484:F:Invalidation:invalidation_unix_sock_shared_fd_gdr_pin_buffer_vmmalloc:0: Failed
sanity.cpp:1635:F:Invalidation:invalidation_unix_sock_shared_fd_gdr_map_vmmalloc:0: Failed
@drossetti drossetti added the bug label Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants