Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: Make the Rust vtab interface safe? #414

Open
sourcefrog opened this issue Dec 19, 2024 · 0 comments
Open

rfc: Make the Rust vtab interface safe? #414

sourcefrog opened this issue Dec 19, 2024 · 0 comments

Comments

@sourcefrog
Copy link
Contributor

The VTab trait

https://github.com/duckdb/duckdb-rs/blob/main/crates/duckdb/src/vtab/mod.rs

has

pub trait VTab: Sized {
    /// The data type of the bind data
    type InitData: Sized + Free;
    /// The data type of the init data
    type BindData: Sized + Free;

    unsafe fn bind(bind: &BindInfo, data: *mut Self::BindData) -> Result<(), Box<dyn std::error::Error>>;
 
    // ....
}

This interface is a bit easy to misuse in Rust. The user's implementation of bind is called with data pointing to an allocated block that is large enough to hold a Self::BindData but uninitialized.

As a result, if you convert the pointer to a &mut ref you get undefined behavior. https://github.com/duckdb/duckdb-rs/blob/main/crates/duckdb/examples/hello-ext/main.rs suggests doing this: it may be safe in the specific case of that code, I haven't checked, but if a more realistic implementation has more complex data then it can cause memory corruption.

So, at least, it would be nice to document this more.

But, since this is meant to be the higher-level interface, I think we could do better and just make the interface entirely safe?

I think there are a couple of potentially feasible approaches. Which one would actually work would take some experimentation:

  1. The Rust vtab::bind could just create and return the binddata that it wants, which could then be put in a box and passed to the C API?
  2. Or, if VTab::BindData: Default then it could be initialized to the default value and passed in as a &mut.

I think these would also probably get rid of the need for the Free trait there?

This would be an API break but IMHO moving towards a safe interface would be worth it. Although, I suppose, we could probably keep the existing unsafe interfaces present but deprecated.

I can try to send you a PR for this, but I would just like to ask first

  1. Is this still maintained (What's the current status of this crate? #412)?
  2. Would this be OK in principle?

cc @joewalnes

sourcefrog added a commit to sourcefrog/duckdb-rs that referenced this issue Dec 21, 2024
Rather than passing a pointer to a block of uninitialized memory, which can easily lead to UB, these functions now just return Rust objects.

This improves duckdb#414 by reducing the amount of unsafe code needed from extensions.
Maxxen added a commit that referenced this issue Feb 10, 2025
* Use ptr::write when writing uninitialized memory

* Use smaller unsafe blocks

* Rust BindData/InitData can just use Rust types

* Use unsafe blocks inside unsafe fns generated by macro

This will prevent the macro generating a warning in edition 2024

* Fix unused import

* Better safety docs for vtab methods

Contrary to the previous docs, the instances passed to these functions are *not* initialized by the caller. Rather, the called function is responsible for writing into uninitialized memory.

* Similar safety fixes in vtab tests

* VTab::bind and init are now safe

Rather than passing a pointer to a block of uninitialized memory, which can easily lead to UB, these functions now just return Rust objects.

This improves #414 by reducing the amount of unsafe code needed from extensions.

* vtab::Free is no longer needed

BindInfo and InitInfo will be dropped in the usual way when freed by duckdb core. Any necessary destructors can be in Drop impls.

* BindData and InitData should be Send+Sync

It's not completely clear but it looks like the engine could run the table fn from multiple threads, so requiring this seems safer

* Add a safe & typed interface to get bind_data

* Also safely retrieve the init_data

* Add unsafe blocks, rm unnecessary cast

* clippy

---------

Co-authored-by: Martin Pool <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant