-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bulk loading #104
Comments
Hi @mcamou, there is another alternative as well, which should offer decent performance:
There are some examples of this in a project I'm working on:
|
@esheppa nice hint. Safe bet is 4500 rows per batch. This won't be faster for insert, but won't block other queries. |
Regarding the idea, I'd like to see something even better than Tedious does: streaming insert. Provide an async file stream with mapping, save memory and execute a fast bulk insert. I'm fascinated about the idea but not sure would it be possible to make working. I'm also extremely busy for the next few months and the next planned thing from me for Tiberius is to make TLS working on macOS. If somebody has time before spring and would like to try something out, I'd be interested to see a pull request! |
It's definitely possible. I've used it in java JDBC using SQLBulkCopy class. |
I don't think it's a huge amount of work, we have good implementations already in JDBC and in Tedious. It's a good idea to read the TDS manual, but it might be easier to just see what Tedious does and what the JDBC driver does. I hope the facilities in Tiberius are good enough, and I can provide help in our Discord server if you want to try your luck implementing it. We should discuss a bit how the implementation should look like beforehand. Actually, I'm quite excited to write this by myself. I'm just quite busy now and I know the next thing I have allocated for Tiberius from work is going to be the macOS TLS fix, so I doubt I can start this that soon... |
I was curious about this and it turned out not very hard to get the basics working (for a 1x2 table with hard-coded structure and content for now). See https://github.com/nickolay/tiberius/commits/wip/bulkload According to https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-TDS/%5BMS-TDS%5D.pdf the bulk-load exchange can be expressed in terms of the structs already existing in tiberius:
So the bulk of the changes needed to make this work is simply adding The new bits are: a new I leave the interesting part of designing the streaming API and the tedious work of implementing and testing the bulk uploading of all the supported types to someone else. |
Can we even do a streaming API for bulk loads? It would be nice to read data from e.g. a huge CSV file in batches to memory, then write to the database and load another batch. Does TDS actually allow this, or do you actually have to write all the data to the token before you can write it to the wire? I'm not going to have any need for this feature, so I'm probably the wrong person to answer on how people would like to use this... |
I’m not sure I understand what you’re saying.. Note that each row is a separate token in TDS, so there shouldn’t be any protocol-level problems, the question is designing how the rust API should look like. |
Mm, true. If there's anybody who'd actually need this feature, could you write down to this ticket how you'd like the API to look like? We don't need to go all async in the first try, but at least a good first PR would be nice for commenting... |
Could people needing bulk inserts test and see if the WIP API works for you? I also don't have that much time anymore for it, so maybe if you want it to get finished faster, help would be nice for writing tests and fixing the issues... |
Hi Julius, thanks for the effort. I'll test it next week. |
What's the latest on this topic? |
The PR is there. The annoying thing is you have to define the columns before just right or nothing happens. I think the API sucks and we have no need for it, so I get no time to work on subject. Check the PR and if you need the feature, some advice how the api should look like would be nice. |
I will spend some time on this if no one else is looking into it. |
could someone take a review at #227 This is the interface
|
This was merged already a while ago. |
Running
INSERT INTO table VALUES(...)
is really slow especially when e.g. inserting against an Azure server. You can batch inserts withINSERT INTO table VALUES(({},{}),({},{}))
, but that is limited to 1000 values. You can also useINSERT INTO table SELECT a,b FROM (VALUES {},{},{},{}))sub(a,b)
as mentioned in steffengy#93, but you are limited to 2100 values in the query.A solution to this would be to implement something similar to what Tedious (NodeJS library) implements: https://tediousjs.github.io/tedious/bulk-load.html
The text was updated successfully, but these errors were encountered: