After spending some time in a multithreaded program using duckdb I ran into a segmentation fault. I can be a bit more careful in my Rust code, but I was wondering if I could fix it for you. To do so I need some guidance in the way the underlying API works. The failing test can be viewed here. Its code:
#[test]
fn test_clone_after_close() {
// Additional querying test to make sure our connections are still
// usable. The original crash would happen without doing any queries.
fn assert_can_query(conn: &Connection) {
conn.execute("INSERT INTO test (c1) VALUES (1)", []).expect("insert");
}
// 1. Open owned connection
let owned = checked_memory_handle();
owned
.execute_batch("create table test (c1 bigint)")
.expect("create table");
assert_can_query(&owned);
// 2. Create a first clone from owned
let clone1 = owned.try_clone().expect("first clone");
assert_can_query(&owned);
// 3. Close owned connection
drop(owned);
assert_can_query(&clone1);
// 4. Create a second clone from the first clone. Crashes on the inner
// `duckdb_connect` with a segmentation fault.
let clone2 = clone1.try_clone().expect("second clone");
assert_can_query(&clone1);
assert_can_query(&clone2);
// 5. Small additional test
drop(clone1);
assert_can_query(&clone2);
}
The problem is calling InnerConnection::new(...) when the ffi::duckdb_database is set to a null pointer after closing.
I would like to fix it, but there is some overhead involved. And I need some guidance on the internals of the C-interface to duckdb. We could put the ffi::duckdb_database behind some reference counting (like an Arc), and only call ffi::duckdb_close(...) when that database object is dropped. This would introduce an extra pointer indirection through the Arc and the overhead of atomic operations.
But to then make the entire implementation a bit more sound we would need to change the calls:
Connection::open_from_raw: as this would now require an Arc-wrapped object.
InnerConnection::close(...): must be changed to consume self. Which causes a problem in Connection::close(...) with the Connection::db, which is a RefCell<InnerConnection>. As we would have to consume that inner connection as well. To solve this (and to reflect the fact that one cannot share a Connection across threads anyway, only move it between threads, is to make the methods on Connection take a &mut self parameter.
Here is where I stopped checking out the code. As I would like to hear if you would even want to see such far-reaching changes to the API.
Perhaps there is another (simpler) solution, some options, with varying degrees of ugliness:
- Have an
OwnedConnection that actually owns the InnerConnection. Only that OwnedConnection supports try_clone. Can be implemented with typestate as Connection<Owned>.
- Perhaps just returning an
Err from try_clone when we detect that the "owned" InnerConnection has been closed, but this would require some thread-safe sharing of a flag, with the implied overhead.
I'd love to hear from you. For now I'll just keep the "owned" InnerConnection in a very special place in my code :).
After spending some time in a multithreaded program using duckdb I ran into a segmentation fault. I can be a bit more careful in my Rust code, but I was wondering if I could fix it for you. To do so I need some guidance in the way the underlying API works. The failing test can be viewed here. Its code:
The problem is calling
InnerConnection::new(...)when theffi::duckdb_databaseis set to a null pointer after closing.I would like to fix it, but there is some overhead involved. And I need some guidance on the internals of the C-interface to duckdb. We could put the
ffi::duckdb_databasebehind some reference counting (like anArc), and only callffi::duckdb_close(...)when that database object is dropped. This would introduce an extra pointer indirection through theArcand the overhead of atomic operations.But to then make the entire implementation a bit more sound we would need to change the calls:
Connection::open_from_raw: as this would now require anArc-wrapped object.InnerConnection::close(...): must be changed to consumeself. Which causes a problem inConnection::close(...)with theConnection::db, which is aRefCell<InnerConnection>. As we would have to consume that inner connection as well. To solve this (and to reflect the fact that one cannot share aConnectionacross threads anyway, only move it between threads, is to make the methods onConnectiontake a&mut selfparameter.Here is where I stopped checking out the code. As I would like to hear if you would even want to see such far-reaching changes to the API.
Perhaps there is another (simpler) solution, some options, with varying degrees of ugliness:
OwnedConnectionthat actually owns theInnerConnection. Only thatOwnedConnectionsupportstry_clone. Can be implemented with typestate asConnection<Owned>.Errfromtry_clonewhen we detect that the "owned"InnerConnectionhas been closed, but this would require some thread-safe sharing of a flag, with the implied overhead.I'd love to hear from you. For now I'll just keep the "owned"
InnerConnectionin a very special place in my code :).