-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyByteArray::to_vec
is unsound
#4736
Comments
Yes, I agree that to_vec is unsound here.
…On Wed, Nov 27, 2024 at 10:41 AM Robsdedude ***@***.***> wrote:
https://github.com/PyO3/pyo3/blob/v0.23.2/src/types/bytearray.rs#L297-L299
The implementation just calls the unsafe as_bytes function while not
doing anything to uphold the safety invariant that no other thread modifies
the bytearray. I think this is fine with the GIL, but without it, other
thread might be ruining the show.
Aside: for the as_bytes API to be any useful with free threaded python
people will want to know about critical sections
<https://docs.rs/pyo3/latest/pyo3/sync/fn.with_critical_section.html> (at
least I think that's the right tool). So maybe a cross reference in the
docs would go a long way.
—
Reply to this email directly, view it on GitHub
<#4736>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAGBA7RWFRWJMOPMNDNDT2CXR3JAVCNFSM6AAAAABSTEZV22VHI2DSMVQWIX3LMV43ASLTON2WKOZSGY4TQOJZGYYTENY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
All that is necessary for evil to succeed is for good people to do nothing.
|
I'm not sure if a critical section is sufficient here if other threads can write to the array via the buffer protocol. Does PyO3 expose read-write access via the buffer protocol? |
https://docs.rs/pyo3/latest/pyo3/buffer/struct.PyBuffer.html
But more importantly, it doesn't quite matter what pyo3 exposes:
https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/
…On Wed, Nov 27, 2024 at 12:33 PM Nathan Goldbaum ***@***.***> wrote:
I'm not sure if a critical section is sufficient here if other threads can write to the array via the buffer protocol. Does PyO3 expose read-write access via the buffer protocol?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
--
All that is necessary for evil to succeed is for good people to do nothing.
|
So isn't it unsafe even in the GIL-enabled build? Just less likely to be an issue. |
I think so, with the caveat that in a no-GIL world, this can be cause
UB with pure Python code, whereas in a GIL world it required another
extension module.
…On Wed, Nov 27, 2024 at 12:38 PM Nathan Goldbaum ***@***.***> wrote:
So isn't it unsafe even in the GIL-enabled build? Just less likely to be an issue.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
--
All that is necessary for evil to succeed is for good people to do nothing.
|
PyByteArrayMethods::to_vec
might not be sound in free threaded pythonPyByteArrayMethods::to_vec
is unsound
slightly off-topic aside: I'm also very interested in this from a completely different angle: the buffer protocol isn't extensible, so if I want to add support for datetimes (just one example) I'd need to write a PEP. We should fix that. Adding a borrow-checker to NumPy might be interesting, although it will be tough to instrument every single place NumPy exposes views into array data (not to mention we'd need to support the legacy buffer protocol probably). |
Is there value in using a critical section as a first step to get at least to the same (lower) level of unsoundness as is present with the GIL, where at least another native extension is needed to cause UB? I'm happy to open a PR in such case. |
I think that's worth doing. Whether or not to mark the function as a whole unsafe I'll leave to other more experienced PyO3 devs. |
In free-threaded Python, to_vec needs to make sure to run inside a critical section so that no other Python thread is mutating the bytearray causing UB. See also PyO3#4736
PyByteArrayMethods::to_vec
is unsoundPyByteArray::to_vec
is unsound
This is one of those really frustrating potentially-unsafe cases where the reader likely can't know whether a writer might exist. So while marking it as unsafe is potentially a useful signal for readers, it also creates an invariant which I think they likely just have to ignore and hope for the best. Either way, the status quo seems like a silent trap, and I would be open to adding The long run, we have to hope for a reworked buffer protocol, I guess. 🤔 |
Reading the blog post my first thought was: this Python C API is just fundamentally broken. Or am I misreading this? To me it seems impossible to write a correct (i.e., race-free) native extension regardless of the language used. So with that in mind, trying to write a sound extension with PyO3 makes it impossible to use |
That's indeed my take, too. As I see it, the buffer protocol cannot protect against data races without a higher-level agreement on synchronization (whether that be the GIL or some other API contract, like the arrow capsule interface). For an isolated extension accepting an arbitrary buffer as input, there is likely no higher-level agreement. Hence why |
Somewhere in the Python discourse there was some discussion of my post,
including a draft PEP iirc, but it hasn't gone anywhere yet
…On Fri, Nov 29, 2024, 8:58 AM David Hewitt ***@***.***> wrote:
That's indeed my take, too. As I see it, the buffer protocol cannot
protect against data races without a higher-level agreement on
synchronization (whether that be the GIL or some other API contract, like
the arrow capsule interface
<https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>).
For an isolated extension accepting an arbitrary buffer as input, there is
likely no higher-level agreement. Hence why PyBuffer is such a clumsy API
to use 😢
—
Reply to this email directly, view it on GitHub
<#4736 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAGBCBOMQ4CEPP3MJW5N32DBXJJAVCNFSM6AAAAABSTEZV22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBXHA3TINBYGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
https://github.com/PyO3/pyo3/blob/v0.23.2/src/types/bytearray.rs#L297-L299
The implementation just calls the unsafe
as_bytes
function while not doing anything to uphold the safety invariant that no other thread modifies the bytearray. I think this is fine with the GIL, but without it, another thread might be ruining the show.Aside: for the
as_bytes
API to be any useful with free threaded python people will want to know about critical sections (at least I think that's the right tool). So maybe a cross reference in the docs would go a long way.The text was updated successfully, but these errors were encountered: