SNOW-535791: PUT operations read full input data into memory #536

asonawalla · 2022-01-26T17:48:13Z

We use the streaming PUT feature of gosnowflake to upload data to snowflake internal stages. Recently, we started throwing GB sized files at it and saw our memory usage explode. Through some experimentation with an isolated production workload, we saw that for a 1GB input file, the driver was using approximately 8GB of memory when the PUT operation was configured as a stream, and 2-3GB when it was configured as a file on disk.

Looking at the code, the problem seems to be that alleged "streams" are often passed around as bytes.Buffers that read the entire input data into memory multiple times over, the exact amount depending on the command and options (e.g. here, here, here, everywhere this is invoked, etc). Some crude experimentation with a fork I've created suggest there are likely more places.

Beyond the obvious issue that the documentation on streaming puts is misleading, I would go so far as to say the driver should never have to read the entire input contents into memory. The major operations it's responsible for (compression, encryption, calculating digests, etc) are all possible with modest buffers that don't need to scale with the input size.

I'd be happy to contribute here, but wanted to start a discussion since the required changes seem to be nontrivial.

The text was updated successfully, but these errors were encountered:

sfc-gh-dszmolka · 2023-03-28T11:32:02Z

thank you for submitting this issue with us! read through the linked PR 538 and it has another reference to another PR , #527 which is merged, so I guess this should be resolved for now.
if i got it wrong, please feel free to reopen or comment.

williamhbaker · 2023-04-12T22:49:11Z

I can confirm that this is still an issue. There are at least instances where the stream is read fully into memory still here and here (for the s3 uploader).

The concept of having the entire contents of the stream available seems pretty baked-in, for example here where the size of the stream is needed. I can imagine it will take some effort to fully refactor this but it would be be useful to handle streaming PUTs in a memory-efficient way.

sfc-gh-dszmolka · 2023-04-13T06:20:48Z

thank you for reviewing and commenting and especially for providing useful details! reopening this Issue and marking it as an enhancement for the product team to consider

github-actions bot changed the title ~~PUT operations read full input data into memory~~ SNOW-535791: PUT operations read full input data into memory Jan 26, 2022

asonawalla mentioned this issue Jan 27, 2022

Memory improvements for PUT operations #538

Closed

6 tasks

github-actions bot closed this as completed Jul 1, 2022

sfc-gh-jfan reopened this Jul 1, 2022

github-actions bot closed this as completed Jul 2, 2022

sfc-gh-jfan reopened this Jul 6, 2022

sfc-gh-dszmolka closed this as completed Mar 28, 2023

williamhbaker mentioned this issue Apr 12, 2023

snowflake: streaming PUTs to internal stages estuary/connectors#50

Open

sfc-gh-dszmolka reopened this Apr 13, 2023

sfc-gh-dszmolka added the enhancement The issue is a request for improvement or a new feature label Apr 13, 2023

sfc-gh-dszmolka assigned sfc-gh-anugupta May 30, 2023

rompetroll mentioned this issue Aug 23, 2023

High Memory usage in gosnowflake mimiro-io/snowflake-datalayer#11

Closed

sfc-gh-dszmolka added the status-triage_done Initial triage done, will be further handled by the driver team label Mar 12, 2024

sfc-gh-dprzybysz assigned sfc-gh-snow-drivers-warsaw-dl and unassigned sfc-gh-anugupta Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-535791: PUT operations read full input data into memory #536

SNOW-535791: PUT operations read full input data into memory #536

asonawalla commented Jan 26, 2022

sfc-gh-dszmolka commented Mar 28, 2023

williamhbaker commented Apr 12, 2023

sfc-gh-dszmolka commented Apr 13, 2023

SNOW-535791: PUT operations read full input data into memory #536

SNOW-535791: PUT operations read full input data into memory #536

Comments

asonawalla commented Jan 26, 2022

sfc-gh-dszmolka commented Mar 28, 2023

williamhbaker commented Apr 12, 2023

sfc-gh-dszmolka commented Apr 13, 2023