-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Bulk loads #163
WIP: Bulk loads #163
Conversation
d58ce24
to
489adad
Compare
Missing: when using strings, we must have an API to set the collation. By not setting it we just get an error... |
3c24cb4
to
ae26c3b
Compare
8595a4f
to
a5c9b8c
Compare
Ok, so I had a bit more time to do research here. I know how to do the actual feature, but making an API that doesn't randomly give you errors that are really nasty to solve is not that easy. One example: nullable columns. We can see first a bulk load that works with what we have in the pull request: let mut client = Client::connect(config, tcp.compat_write()).await?;
client
.execute(
"CREATE TABLE ##bulk_test1 (id INT IDENTITY PRIMARY KEY, content INT NOT NULL)",
&[],
)
.await?;
let mut meta = BulkLoadMetadata::new();
meta.add_column("content", TypeInfo::int());
let mut req = client.bulk_insert("##bulk_test1", meta).await?;
let count = 2000i32;
for i in 0..count {
let mut row = TokenRow::new();
row.push(i.into_sql());
req.send(row).await?;
}
dbg!(req.finalize().await?); The whole thing will break catastrophically into a cryptic error of So, then we can imagine the API to be something like: meta.add_column("content", TypeInfo::int(), true); or meta.add_column("content", TypeInfo::int(), ColumnFlag::Nullable.into()); ... but I can already sense how much issues will get opened for mystical errors, when you forget to set the correct flags for your metadata. In the end, what I would love to see with this API is something that the dear rustc can catch in the type level. E.g. start a bulk request with a trait-backed struct, that will tell the types, nullability, collation info and what not. Then we could imagine something like #[tiberius]
pub struct MyData {
value: i32,
nullable_integer: Option<i32>,
text_value: Option<String>,
} This then would be in the client, starting a new bulk request: let req: BulkLoadRequest<MyData> = client.bulk_insert("my_data_table").await?;
req.send(MyData { value: 3, nullable_integer: Some(32), text_value: None }); This would behind the scene then handle the correct order of columns, the correct typing and nullability info. First though there is a lot of work just detecting what types work and what do not. The question of collations is also something I'd like to write an answer. Can we just use the |
I think that what you have here so far is a great starting point! (Though I won't be using it for a real project in the near future.)
Note that this error is about a mismatch in structure between a database table and the bulkload request's metadata. Relying on Rust's type system won't be able to help with that, if one needs to load into a table that already exists. The |
I'm more thinking of something that prevents us sending data to the bulk load that is against the description given in the headers, that can be detected in compile time. Right now the PR has an API that requires lot of tweaking in a few places every time you want to change something in the data types. What would for sure be possible is for example having two connections, the other one giving back a One API that might also be worth a try would be something that queries a table, and returns the metadata to be used in a bulk load. In the end the feature is not that hard to implement, but making a robust API that's worth the upcoming Tiberius 1.0 is something that I'd like to get done before merging anything... |
Yep, apparently, that's how the MSSQL ODBC driver works ("When a data file is specified, bcp_init examines the structure of the database source or target table, not the data file."; here's how FreeTDS does it). I started implementing bulkload from CSV via arrow2 nickolay@2c0e5e5#diff-705aa08e944f5e9851ee12e3ad6b491a763150b42e9cf3ecde25fc84a7e32a9eR36 - but got stuck with lifetime issues (should |
@pimeys I think your implementation has most pieces. To your concern, in my scenario, I will just get the column info from SQL database. I.e., assume the table exist before bulkupload. And trigger a SQL query to fill in the column metadata. Whether the column should be null or not is already decided by the table. I will take this PR, and add some adaptor to enable loading schema. You ok with that? |
regarding datatype, why not use the existing Column/Row structure, but define a new set of structure MetaDatacolumn and TokenRow? |
a bit more research on this. It seems retrieving the column meta data info from table first is a common practice (MS C# lib follows the similar pattern). I will go ahead with the following:
@pimeys , please let me know if you are ok with it. |
So, my try continuing the work of @nickolay implementing an interface for bulk loads.
The interface here allows you to efficiently store rows without loading them all to memory.
Would people who need the feature be willing to test this and extend it? I see a need for tests and fixing possible issues we might face. Also I'm not completely sure about the interface, so would be great to have some opinions on that.
Closes: #104