Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1747415 SNOW-1215393: Out of Memory Issue in memory-limited environment while streaming data #790

Open
kartikgupta2607 opened this issue Mar 6, 2024 · 21 comments
Assignees
Labels
backend changes needed Change must be implemented on the Snowflake service, and not in the client driver. bug Something isn't working status-blocked Progress cannot be made to this issue due to an outside blocking factor. status-triage_done Initial triage done, will be further handled by the driver team

Comments

@kartikgupta2607
Copy link

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of NodeJS driver are you using? ->snowflake-sdk -> Tried with 1.6.20, 1.9.3

  2. What operating system and processor architecture are you using? -> macOS 14.3.1, arm64

  3. What version of NodeJS are you using? -> v18.12.1

  4. What are the component versions in the environment (npm list)? -> NA

  5. Server version: -> 8.9.1

  6. What did you do?
    Tried running this (snowflake_OOM.txt) script to export records from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER table, after limiting max-old-space-size=150 while running the script. A simple SELECT * FROM CUSTOMER; was used, but the node process exited after exporting some rows (200000) with FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory error but on modifying the same query with a LIMIT clause (with limit of 1,000,000), it was able to export 1M records with the same old-space limit. I tried workarounds mentioned in SNOW-750472 Out of memory issue #43 and the linked issues but none worked. Tried using the streamResult option in the connection config and while executing the query, tried degrading to 1.6.20.
    Following is the metadata of the source table

image As it doesn't have any VARIANT type column, so #536 and the related issues because of it shouldn't matter in our case
  1. What did you expect to see?

    It should be able to export the same number of records before the OOM issue occurs, also it seems a large buffer is fetched if the LIMIT isn't present in the query as it seems the GC triggers a bit late for query without LIMIT

  2. Can you set logging to DEBUG and collect the logs? -> Can't upload logs due to company security policies.

var snowflake = require('snowflake-sdk');
snowflake.configure(
{
  logLevel: 'trace'
});
  1. What is your Snowflake account identifier, if any? (Optional)
@kartikgupta2607 kartikgupta2607 added the bug Something isn't working label Mar 6, 2024
@github-actions github-actions bot changed the title Out of Memory Issue while streaming data SNOW-1215393: Out of Memory Issue while streaming data Mar 6, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Mar 7, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage Issue is under initial triage and removed bug Something isn't working labels Mar 7, 2024
@sfc-gh-dszmolka
Copy link
Collaborator

hi - thanks for raising this issue with us, i'll take a look and see how we can proceed

@shelbylsmith
Copy link

I have the same issue with node 20.9.0 and snowflake-sdk 1.9.0.

@sfc-gh-dszmolka
Copy link
Collaborator

thank you again for the detailed reproduction, the issue could be reproduced even with the TPCH_SF100 version of the CUSTOMER table which 'only' has 15000000 rows.

Of course if one leaves the memory limit flag off, the query runs to completion and we can observe that memory usage goes up into the ~1G range with this table in this comment, then GC kicks in, and this goes until query completes.
Additionally if a smaller table is created e.g. with 1M rows from the original table, and the smaller table is queried instead with SELECT * FROM SMALLER_TABLE, then the query also runs to completion even with the memory limit.

I'm also not sure if this is the expected behaviour with the 'streaming rows' functionality and also comparing some heap snapshots in the 'bad' and 'good' scenario raises additional questions so now I involved the driver team to take a look at this.
Will keep this thread posted with the progress.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Mar 8, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the bug Something isn't working label Mar 8, 2024
@MC7771
Copy link

MC7771 commented Mar 13, 2024

You can try to fetch stream in specific range like this:
const stream = stmt.streamRows({ start, end })
This way you can limit number of records fetched from snowflake.
But you need to write your own loop and calculate start, end index.

@sfc-gh-dszmolka
Copy link
Collaborator

based on the same idea, perhaps the LIMIT ... OFFSET ... construct could be also a workaround as part of the big query, and alleviate the problem by 'partitioning' the query result and iterating over it. But it's still just a workaround, until we find a solution.

@sfc-gh-dszmolka
Copy link
Collaborator

a very quick update: this seems to be deeply interlaced with how Snowflake as the backend behaves today when creating the query result chunks on the server-side, and might not be easily influenced solely on the client (driver) side - besides the workarounds already discussed. Multiple teams are working on this issue and I'll keep this thread posted.

@kartikgupta2607
Copy link
Author

Hello @sfc-gh-dszmolka , do you have any updates regarding the previous request? I attempted the workaround mentioned earlier by passing {start, end} in streamRows, but the issue persists. Below is a sample script attached for your reference. Please let me know if there are any adjustments I should make. Additionally, I noticed that even when I'm not storing the data in the script, the heap size increases significantly. Are the rows referenced somewhere internally?
snowflake_OOM_batch.txt

@sfc-gh-pmotacki
Copy link
Collaborator

Hi @kartikgupta2607 We have internal discussion about this issue. We will inform you.

@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Apr 15, 2024

hey folks a quick update. As my colleague mentioned above, we're discussing all the possibilities internally. Sadly the fix is not necessarily trivial because as it seems, it requires server-side (driver independent) changes, which might take a while.

Until it's implemented, please refer to the mitigations mentioned above

  • fetch range of stream and iterate over ranges, solution from MC7771
  • use Snowflake builtin LIMIT ... OFFSET ... predicate to define the slice of results you get from Snowflake and loop over these slices for very big results

thank you so much for bearing with us while this is discussed - i'll keep this thread posted

@kartikgupta2607
Copy link
Author

thanks for the update @sfc-gh-dszmolka!
the workaround suggested by @MC7771 doesn't seems to work for me, I've attached a sample script for reference here

@sfc-gh-dszmolka
Copy link
Collaborator

understood , thanks for confirming. Can you please try the other method as well ?

In the meantime, I have further progress update which is not that good. As mentioned above, to actually fix the issue we need server-side improvements implemented (in the Snowflake architecture, it's the Snowflake engine itself who decides the number of query result chunks, size of them, etc. - client driver cannot do anything about it. The issue is connected to how these chunks of query result are generated).
As a possible proper solution we plan to enhance the backend to give more control of the query result chunk creation to the client side to be able to adapt to these restricted memory situations.

It became clear very recently that due to other higher priority issues, the server-side improvement surely cannot be implemented in the next upcoming months, so the earliest it can be possibly addressed is the second half of this year.

I'm very sorry to bear such bad news, but wanted to set the expectations about the timeline.

Which also means the following things:

  1. people affected will need to run on some sort of workaround like LIMIT .. OFFSET .. which could 'partition' the very large query result sets. (or create temporal/transient tables with partial query results from the big one and query the temporal one; or similar)

  2. Alternatively run the workloads which must work on really big query results, in an environment which is not so strictly memory restricted.

But most importantly: if you're already a Snowflake customer and affected by this issue, please do reach out to your Account Team and emphasize how implementing this server-side improvement would be important to your use-case. This could bring some traction and possibly re-prioritizing the backend change.

Again sorry to bring such news and the inconvenience the current behaviour causes - and thank you for bearing with us while the server-side change is implemented. Will keep this thread posted with the progress, if any.

@sfc-gh-dszmolka sfc-gh-dszmolka changed the title SNOW-1215393: Out of Memory Issue while streaming data SNOW-1215393: Out of Memory Issue in memory-limited environment while streaming data Apr 17, 2024
@kartikgupta2607
Copy link
Author

Sure @sfc-gh-dszmolka , will try the other method. Please keep us posted if it gets re-prioritised.

@RobLuzmo
Copy link

@kartikgupta2607 did partitioning with LIMIT .. OFFSET .. work?
We're hitting the same OOM issue even with 4 GB of memory :/
This workaround also isn't straightforward as one needs to apply absolute ordering in the query and hope that no updates happen to the table while you're querying it...

@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Apr 30, 2024

We're hitting the same OOM issue even with 4 GB of memory :/

That's very strange (but of course can be possible I guess, if your table is wide enough)
Just to be on the safe side, can you please make double sure you actually use streaming result, and not trying to store the whole resultset (default behaviour) in memory?

@RobLuzmo
Copy link

Yeah we're doing the following:

const statement = conn.execute({ sqlText: query, streamResult: true, rowMode: 'array' });
const stream = statement.streamRows();

Note that this only happens when trying to retrieve an entire dataset containing almost 400 million rows.

@owlas
Copy link

owlas commented Jun 22, 2024

@sfc-gh-dszmolka is there a way to serialize the statement object to enable async pagination of results using streamRows({ start, end }) ?

For example:

const statement = conn.execute({ ... })

// pseudo-code
// store statement, start=0, end=pageLength in database
const queryId = storeStatement(statement, start, end)

// load statement
const { statement, start, end } = loadStatement(queryId)

// stream page
const stream = statement.streamRows({ start, end })

This would be really useful for downloading results in parallel, or paginating in a load balancing scenario where multiple servers are handling query execution.

This seems possible with the SQL REST API: https://docs.snowflake.com/en/developer-guide/sql-api/handling-responses#retrieving-additional-partitions

But I guess using the node sdk will be higher performance? If that's not true we could fall back to the REST API

@sfc-gh-dszmolka
Copy link
Collaborator

at this moment this does not seem to work , at least not this way @owlas . At least in this very thread, folks tried using streamRows({ start, end }) and did not lead to the result which could handle this scenario appropriately. If you're perhaps able to launch multiple separate queries with LIMIT .. OFFSET .. predicates, all in parallel (maybe even async with the asyncExec flag), maybe it can implement what you're seeking?

do note please that it (it == governing the size of the query result chunks) is also not possible with the SQL REST API; the number and size of the partitions are unilaterally determined by the Snowflake engine itself, with no seemingly available customer-exposed method of being to override it.
That's what in discussion with the engine team; independently from this particular driver since the issue affects all of them (one quick look in PythonConnector, JDBC driver, etc repo will yield similar issues over there - makes sense; since the behaviour is governed on server-side and not client side; results are similar in memory-limited environments)

@sfc-gh-dszmolka
Copy link
Collaborator

I have some update. (Very) recently the server-side code has been changed to support smaller chunk sizes, which in turn hopefully results in less likely to be OOM. After the server-side code has been released, setting CLIENT_RESULT_CHUNK_SIZE (values in MB, min: 16, max: 160, default: 160) should help in controlling the (uncompressed) size of the query result chunks generated on the server side which then needs to be downloaded by the client.

I have yet to confirm of the Snowflake (server) version with which this new change will be released.

Again: very specially; this is not a client library release we're waiting for, but the server side. Also consequentially, it is driver-independent, and affects all of our driver libraries, not just this.
The client-side parameter (CLIENT_RESULT_CHUNK_SIZE) is already there for years, just did not work entirely as expected.

Will keep this thread posted.

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. label Oct 2, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-blocked Progress cannot be made to this issue due to an outside blocking factor. label Oct 2, 2024
@sfc-gh-dszmolka
Copy link
Collaborator

update: server-side changes seem to be rolled out with Snowflake version 8.39, which is scheduled for next week. Will update this thread when the appropriate Snowflake (server) version is live in production.

@LalowiczB
Copy link

@sfc-gh-dszmolka Hi, we are also experiencing the same issue (we have an internal support ticket opened with Snowflake support). I can see that the engine version has been 8.39.2 since yesterday.

The issue isn't resolved at least for node.js. I created a blank nodejs project where I stream 400k rows and just simply log them to the console to make sure that no other logic is involved. The memory spikes up to 1GB and changing the session setting for chunk size does not seem to have any effect.

Could you let me know if the release contained the announced fix?

@sfc-gh-dszmolka
Copy link
Collaborator

hey @LalowiczB - the 8.39 indeed contained the announced fix, and i can confirm now CLIENT_RESULT_CHUNK_SIZE can be set to as low as 16MB (earlier: 48MB).
I can also confirm unfortunately that this change did not seem to singlehandedly resolve the issue. Reproductions show similar results what you saw, so we need to look further into this what else needs to be done, for the driver to seamlessly be able to stream a large amount of rows even in a memory-limited environment.

@sfc-gh-dszmolka sfc-gh-dszmolka removed status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. status-blocked Progress cannot be made to this issue due to an outside blocking factor. labels Oct 17, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka changed the title SNOW-1215393: Out of Memory Issue in memory-limited environment while streaming data SNOW-1747415 SNOW-1215393: Out of Memory Issue in memory-limited environment while streaming data Oct 17, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added status-blocked Progress cannot be made to this issue due to an outside blocking factor. backend changes needed Change must be implemented on the Snowflake service, and not in the client driver. labels Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend changes needed Change must be implemented on the Snowflake service, and not in the client driver. bug Something isn't working status-blocked Progress cannot be made to this issue due to an outside blocking factor. status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

9 participants