-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1747415 SNOW-1215393: Out of Memory Issue in memory-limited environment while streaming data #790
Comments
hi - thanks for raising this issue with us, i'll take a look and see how we can proceed |
I have the same issue with node 20.9.0 and snowflake-sdk 1.9.0. |
thank you again for the detailed reproduction, the issue could be reproduced even with the Of course if one leaves the memory limit flag off, the query runs to completion and we can observe that memory usage goes up into the ~1G range with this table in this comment, then GC kicks in, and this goes until query completes. I'm also not sure if this is the expected behaviour with the 'streaming rows' functionality and also comparing some heap snapshots in the 'bad' and 'good' scenario raises additional questions so now I involved the driver team to take a look at this. |
You can try to fetch stream in specific range like this: |
based on the same idea, perhaps the LIMIT ... OFFSET ... construct could be also a workaround as part of the big query, and alleviate the problem by 'partitioning' the query result and iterating over it. But it's still just a workaround, until we find a solution. |
a very quick update: this seems to be deeply interlaced with how Snowflake as the backend behaves today when creating the query result chunks on the server-side, and might not be easily influenced solely on the client (driver) side - besides the workarounds already discussed. Multiple teams are working on this issue and I'll keep this thread posted. |
Hello @sfc-gh-dszmolka , do you have any updates regarding the previous request? I attempted the workaround mentioned earlier by passing {start, end} in streamRows, but the issue persists. Below is a sample script attached for your reference. Please let me know if there are any adjustments I should make. Additionally, I noticed that even when I'm not storing the data in the script, the heap size increases significantly. Are the rows referenced somewhere internally? |
Hi @kartikgupta2607 We have internal discussion about this issue. We will inform you. |
hey folks a quick update. As my colleague mentioned above, we're discussing all the possibilities internally. Sadly the fix is not necessarily trivial because as it seems, it requires server-side (driver independent) changes, which might take a while. Until it's implemented, please refer to the mitigations mentioned above
thank you so much for bearing with us while this is discussed - i'll keep this thread posted |
thanks for the update @sfc-gh-dszmolka! |
understood , thanks for confirming. Can you please try the other method as well ? In the meantime, I have further progress update which is not that good. As mentioned above, to actually fix the issue we need server-side improvements implemented (in the Snowflake architecture, it's the Snowflake engine itself who decides the number of query result chunks, size of them, etc. - client driver cannot do anything about it. The issue is connected to how these chunks of query result are generated). It became clear very recently that due to other higher priority issues, the server-side improvement surely cannot be implemented in the next upcoming months, so the earliest it can be possibly addressed is the second half of this year. I'm very sorry to bear such bad news, but wanted to set the expectations about the timeline. Which also means the following things:
But most importantly: if you're already a Snowflake customer and affected by this issue, please do reach out to your Account Team and emphasize how implementing this server-side improvement would be important to your use-case. This could bring some traction and possibly re-prioritizing the backend change. Again sorry to bring such news and the inconvenience the current behaviour causes - and thank you for bearing with us while the server-side change is implemented. Will keep this thread posted with the progress, if any. |
Sure @sfc-gh-dszmolka , will try the other method. Please keep us posted if it gets re-prioritised. |
@kartikgupta2607 did partitioning with |
That's very strange (but of course can be possible I guess, if your table is wide enough) |
Yeah we're doing the following: const statement = conn.execute({ sqlText: query, streamResult: true, rowMode: 'array' });
const stream = statement.streamRows(); Note that this only happens when trying to retrieve an entire dataset containing almost 400 million rows. |
@sfc-gh-dszmolka is there a way to serialize the statement object to enable async pagination of results using For example: const statement = conn.execute({ ... })
// pseudo-code
// store statement, start=0, end=pageLength in database
const queryId = storeStatement(statement, start, end)
// load statement
const { statement, start, end } = loadStatement(queryId)
// stream page
const stream = statement.streamRows({ start, end }) This would be really useful for downloading results in parallel, or paginating in a load balancing scenario where multiple servers are handling query execution. This seems possible with the SQL REST API: https://docs.snowflake.com/en/developer-guide/sql-api/handling-responses#retrieving-additional-partitions But I guess using the node sdk will be higher performance? If that's not true we could fall back to the REST API |
at this moment this does not seem to work , at least not this way @owlas . At least in this very thread, folks tried using do note please that it (it == governing the size of the query result chunks) is also not possible with the SQL REST API; the number and size of the partitions are unilaterally determined by the Snowflake engine itself, with no seemingly available customer-exposed method of being to override it. |
I have some update. (Very) recently the server-side code has been changed to support smaller chunk sizes, which in turn hopefully results in less likely to be OOM. After the server-side code has been released, setting I have yet to confirm of the Snowflake (server) version with which this new change will be released. Again: very specially; this is not a client library release we're waiting for, but the server side. Also consequentially, it is driver-independent, and affects all of our driver libraries, not just this. Will keep this thread posted. |
update: server-side changes seem to be rolled out with Snowflake version 8.39, which is scheduled for next week. Will update this thread when the appropriate Snowflake (server) version is live in production. |
@sfc-gh-dszmolka Hi, we are also experiencing the same issue (we have an internal support ticket opened with Snowflake support). I can see that the engine version has been 8.39.2 since yesterday. The issue isn't resolved at least for node.js. I created a blank nodejs project where I stream 400k rows and just simply log them to the console to make sure that no other logic is involved. The memory spikes up to 1GB and changing the session setting for chunk size does not seem to have any effect. Could you let me know if the release contained the announced fix? |
hey @LalowiczB - the 8.39 indeed contained the announced fix, and i can confirm now |
Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!
What version of NodeJS driver are you using? ->snowflake-sdk -> Tried with 1.6.20, 1.9.3
What operating system and processor architecture are you using? -> macOS 14.3.1, arm64
What version of NodeJS are you using? -> v18.12.1
What are the component versions in the environment (
npm list
)? -> NAServer version: -> 8.9.1
What did you do?
Tried running this (snowflake_OOM.txt) script to export records from
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER
table, after limiting max-old-space-size=150 while running the script. A simpleSELECT * FROM CUSTOMER;
was used, but the node process exited after exporting some rows (200000) withFATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
error but on modifying the same query with a LIMIT clause (with limit of 1,000,000), it was able to export 1M records with the same old-space limit. I tried workarounds mentioned in SNOW-750472 Out of memory issue #43 and the linked issues but none worked. Tried using thestreamResult
option in the connection config and while executing the query, tried degrading to 1.6.20.Following is the metadata of the source table
What did you expect to see?
It should be able to export the same number of records before the OOM issue occurs, also it seems a large buffer is fetched if the LIMIT isn't present in the query as it seems the GC triggers a bit late for query without LIMIT
Can you set logging to DEBUG and collect the logs? -> Can't upload logs due to company security policies.
The text was updated successfully, but these errors were encountered: