Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-980272: Direct memory not freed / deallocated when using ARROW result format and streaming #1573

Open
koszta5 opened this issue Nov 29, 2023 · 1 comment
Assignees
Labels
status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage

Comments

@koszta5
Copy link

koszta5 commented Nov 29, 2023

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of JDBC driver are you using?
    3.14.3

  2. What operating system and processor architecture are you using?
    RedHat Linux 7.9. , x86_64

  3. What version of Java are you using?
    JDK 11 - Temurin-11.0.19+7

  4. What did you do?
    Use JDBC driver to execute select * from MY_TABLE

  • table has 20+ mil of rows
  • we are using JDBC template in a streaming fashion ( List query(String sql, RowMapper rowMapper, @nullable Object... args) throws DataAccessException)
  1. What did you expect to see?
    As are we streaming the data (steam batch from Snowflake, insert into target, clear batch) there should be no limit for overall data processed (GC takes care of cleaning up object no longer needed)

    What should have happened and what happened instead?

  • internally JDBC uses Apache Arrow and direct memory allocation which is only freed after whole statement execution is finished
  • this is a really bad approach as we consume the data on the fly and stream them to target
  • No size of direct memory would be enough to handle such a large dataset (at 1 go)
  • we always ran out of direct memory
  1. Can you set logging to DEBUG and collect the logs?
  • yes but this is a design issue
  • why is direct memory not being freed up as rs.next() is getting called --> this would be a simple fix
  1. Workaround
  • Workaround is to force JSON query result set format prior to statement execution (alter session set JDBC_QUERY_RESULT_FORMAT='JSON')
  • This does not allocate direct memory and therefore uses GC after rs.next() is called

The current behaviour is not in sync with common JDBC drivers of other DB systems

@koszta5 koszta5 added the bug label Nov 29, 2023
@github-actions github-actions bot changed the title Direct memory not freed / deallocated when using ARROW result format and streaming SNOW-980272: Direct memory not freed / deallocated when using ARROW result format and streaming Nov 29, 2023
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage_needed This is a new issue, and initial triage is needed label Apr 27, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Dec 11, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter and removed bug status-triage_needed This is a new issue, and initial triage is needed labels Dec 11, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi - apologies we weren't able to get to this issue quicker.

Could you please test with the recent drivers (e.g. 3.20.0 or so) and see if the issue still persists? If it does, let us know please and we would very highly appreciate if a runnable reproduction program or at least code snippets could be shared which is representative to the issue. Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-information_needed Additional information is required from the reporter status-triage Issue is under initial triage
Projects
None yet
Development

No branches or pull requests

2 participants