You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not long ago, I attempted to update the connector to use the gosnowflake driver's recently added support for PUTs to Snowflake stages. I ran into a bug that has since been fixed, and we should try again.
While we're at it, we should switch to streaming PUTs to the internal stage as we consume from the Store iterator (instead of staging to a local temporary file, and then starting to upload only after the Store iterator is consumed). In my own profiling, it seems like this would materially reduce data stalls as we execute transactions, as these PUTs typically take seconds to complete for larger files.
As further context, our philosophy on connector errors and retries has shifted, and we're planning to implement a watch-dog in the control plane which looks for failed shards and restarts them with a backoff policy. That means the connector doesn't need to worry about spurious errors and retries while executing the PUT to Snowflake -- it can implement the more efficient, direct strategy of simply streaming to the stage and hoping for the best.
The text was updated successfully, but these errors were encountered:
I attempted to implement streaming PUTs and while it does work now, the memory usage is not practical since the gosnowflake driver reads the entire stream into memory, see snowflakedb/gosnowflake#536. Once this is resolved it should be straightforward to switch to streaming PUTs.
Not long ago, I attempted to update the connector to use the
gosnowflake
driver's recently added support for PUTs to Snowflake stages. I ran into a bug that has since been fixed, and we should try again.While we're at it, we should switch to streaming PUTs to the internal stage as we consume from the Store iterator (instead of staging to a local temporary file, and then starting to upload only after the Store iterator is consumed). In my own profiling, it seems like this would materially reduce data stalls as we execute transactions, as these PUTs typically take seconds to complete for larger files.
As further context, our philosophy on connector errors and retries has shifted, and we're planning to implement a watch-dog in the control plane which looks for failed shards and restarts them with a backoff policy. That means the connector doesn't need to worry about spurious errors and retries while executing the PUT to Snowflake -- it can implement the more efficient, direct strategy of simply streaming to the stage and hoping for the best.
The text was updated successfully, but these errors were encountered: