SNOW-972142: SNOW-1747415 The CLIENT_RESULT_CHUNK_SIZE parameter is not taken into account in different ways of executing the request #1563
Labels
backend changes needed
Change must be implemented on the Snowflake service, and not in the client driver.
enhancement
The issue is a request for improvement or a new feature
status-blocked
Progress cannot be made to this issue due to an outside blocking factor.
status-triage_done
Initial triage done, will be further handled by the driver team
Driver net.snowflake:snowflake-jdbc:3.14.2
First of all we need to prepare the table with 10_000_000 rows. we can do it by the next SQL:
create or replace table long_table as select row_number() over (order by null) as id from table (generator(rowcount =>
10000000)) `After that we can implement the test, which tries to download a lot of simulated data with minimal usage of memory:
` @test
public void bigDataTest() throws Exception {
int colSize = 1024;
int colCount = 12;
The count of chunks, which will be downloaded, we can observe in the variable SFStatement#executeQueryInternal#result. The value of this variable is JSON document, which contains the list of reference on the chunk's URLs as "data.chunks" path. The size of this list is interesting for us - the larger the list, the smaller the size of one chunk and the less memory is used in the process of retrieving data. The following data is obtained with different values of ps.setMaxRows():
At the same time, we see that with ps.setMaxRows(Integer.MAX_VALUE)
and ps.setMaxRows(-1); the value of the CLIENT_RESULT_CHUNK_SIZE configuration parameter is not taken into account at all when calculating the number of chunks.
The following graphs illustrate the actual memory consumption in two fundamentally different cases (data.chunks.size=2423 and data.chunks.size= 469)
In the second case from time to time we receive OOM due to the process of re-downloading chunks. This is another problem - in the process of re-requesting an incorrectly received chunk, this chunk is still held in memory, so the actual consumption will be CLIENT_RESULT_CHUNK_SIZE * 3 instead of the values stated for the given values of the variables CLIENT_RESULT_CHUNK_SIZE * 2
The text was updated successfully, but these errors were encountered: