-
Notifications
You must be signed in to change notification settings - Fork 49
Add stream context #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/tvm_ffi/stream.py
Outdated
| Examples | ||
| -------- | ||
| .. code-block:: python | ||
| s = torch.cuda.Stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need indent one level and a blank line between code block and code
python/tvm_ffi/cython/base.pxi
Outdated
| return <uint64_t>prev_stream | ||
|
|
||
|
|
||
| class StreamContext: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this is not cython dependent, move out to stream.py
python/tvm_ffi/cython/base.pxi
Outdated
| TVMFFIStreamHandle stream, | ||
| TVMFFIStreamHandle* opt_out_original_stream) nogil | ||
|
|
||
| cdef _env_set_current_stream(int device_type, int device_id, uint64_t stream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can expose this as def, so it can be called from python side
python/tvm_ffi/cython/base.pxi
Outdated
| DLTensor* TVMFFITensorGetDLTensorPtr(TVMFFIObjectHandle obj) nogil | ||
| DLDevice TVMFFIDLDeviceFromIntPair(int32_t device_type, int32_t device_id) nogil | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line not needed?
This PR addresses a memory corruption issue in Cython. The fix involves
ensuring that the `ByteArrayArg` object, which holds the type key, is
properly destructed after being passed to the `TVMFFITypeKeyToIndex`
function. This prevents a potential read-after-free scenario, as
reported by ASan.
## ASan Report
```
READ of size 9 at 0x604000420a30 thread T0
...
#5 0x7fdb57299506 in __pyx_f_4core__type_info_create_from_type_key /home/dolores/Projects/tvm-ffi/build/core.cpp:17732
...
0x604000420a30 is located 32 bytes inside of 42-byte region [0x604000420a10,0x604000420a3a)
freed by thread T0 here:
...
#4 0x7fdb572994e2 in __pyx_f_4core__type_info_create_from_type_key /home/dolores/Projects/tvm-ffi/build/core.cpp:17731
...
previously allocated by thread T0 here:
...
#8 0x7fdb57299366 in __pyx_f_4core__type_info_create_from_type_key /home/dolores/Projects/tvm-ffi/build/core.cpp:17718
```
<img width="1444" height="904" alt="image"
src="https://github.com/user-attachments/assets/7a80d33d-dedf-41ca-ac77-108e63b8e57b"
/>
## Recommended ASan Options
One will need to preload `libasan` to properly work with CPython, and
`libstdc++` to properly intercept `__cxa_throw`. The path to those two
files can be found using:
```
ASAN="$(gcc -print-file-name=libasan.so)"
STDCXX="$(g++ -print-file-name=libstdc++.so.6)"
LD_PRELOAD="$ASAN $STDCXX"
```
Additionally, it might be helpful to tweak
```
PYTHONMALLOC=malloc
```
and run with ASan options
```
ASAN_OPTIONS="detect_leaks=0:abort_on_error=1:symbolize=1:fast_unwind_on_malloc=0"
```
Notably, turning on `detect_leaks=1` will lead to bunch of irrelevant
noisy reports. Better turning it off.
This PR exposes the get env stream method as followup of #5.
This PR adds the stream context into ffi, so that ffi env stream can be updated. The
tvm_ffi.use_torch_streamis for wrapping the torch stream/graph context. And lower-leveltvm_ffi.use_raw_streamis for creating context with existing stream handle.Example for
tvm_ffi.use_torch_stream:case with torch stream:
case with torch cuda graph
case with current stream by default
Eaxmple for
tvm_ffi.use_raw_stream: