-
I have recently upgraded slingCLI to the latest version but I've noticed some strange behaviour now when trying to transfer data over from an Azure MSSQL DB to an Azure Storage Account. The previous version I used (1.3.2) worked a treat for my use case. I was able to specify
However, now that Sling is using DuckDB I'm seeing that the replication task no longer supports However, what I've noticed is that the first step is for Sling to seemingly ingest the entire table to DuckDB before creating the files and pushing them to the storage account. As you can probably imagine this pretty quickly leads to a OOMKilled error. For context, I ran SlingCLI against my dataset and within 5 minutes of running it was already using 12GB of memory. In terms of rows it had read it was less than 25% of the table I was working on. (In this case the table was only 89GB). Is this the intended behaviour? If so how do users run Sling when working with large tables (in my case its 300GB) without being OOMKilled. Or is there a way to stop DuckDB having the entire table ingested and instead only take smaller batches for processing and deleting them from DuckDB before proceeding the the next batch? I know I can set |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@EValge-IT thanks for raising this. This was a concern for me as well. |
Beta Was this translation helpful? Give feedback.
@EValge-IT thanks for raising this. This was a concern for me as well.
Can you try with the latest dev build: 1.4.3.dev?
file_max_rows
is added again, working with DuckDB.