Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-873466 Allow reading Arrow record batch streams from result set #1422

Open
BryanCutler opened this issue Jun 15, 2023 · 3 comments
Open
Assignees
Labels
backend changes needed Change must be implemented on the Snowflake service, and not in the client driver. feature status-blocked Progress cannot be made to this issue due to an outside blocking factor. status-triage_done Initial triage done, will be further handled by the driver team

Comments

@BryanCutler
Copy link

I would like to read the result set of a query as streams of Arrow record batches. The QueryResultFormat.ARROW provides serialization of data in Arrow stream format, but I don't see a way I can read those streams directly. Can this be exposed to read directly similar to the ArrowStreamLoader in gosnowflake? see https://github.com/snowflakedb/gosnowflake/blob/master/connection.go#L577

What is the current behavior?

Arrow format is used to serialize data from a result set, but doesn't seem to be exposed to read directly.

What is the desired behavior?

Read the serialized Arrow streams directly.

How would this improve snowflake-jdbc?

By consuming Arrow data directly, entire batches can be read at a time instead of each scalar value, increasing performance.

References, Other Background

Similar interface is provided in gosnowflake, https://github.com/snowflakedb/gosnowflake/blob/master/connection.go#L577

Currently the Arrow ADBC driver makes use of it and shows good performance gains https://github.com/apache/arrow-adbc/blob/main/go/adbc/driver/snowflake/record_reader.go#L242

Arrow stream is being read here https://github.com/snowflakedb/snowflake-jdbc/blob/master/src/main/java/net/snowflake/client/jdbc/SnowflakeChunkDownloader.java#L892

What is your Snowflake account identifier, if any?

@sfc-gh-spanaite
Copy link
Contributor

Thanks for raising this feature request with us. We'll review internally (no estimated timeline for a response due to other priorities).

@aiguofer
Copy link

We're interested in this as well. We're currently converting the JDBC results back to Arrow for a variety of JDBC connectors, but we'd love to keep the data in Arrow format the entire time if possible.

It seems there's also been others looking to do the same: https://stackoverflow.com/questions/65997340/how-can-i-retrieve-data-in-arrow-format-when-querying-snowflake-in-java

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage_done Initial triage done, will be further handled by the driver team label Apr 26, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka changed the title Allow reading Arrow record batch streams from result set SNOW-840018 Allow reading Arrow record batch streams from result set Apr 26, 2024
@carlossc
Copy link

At Denodo, we are also interested on this. We would like to improve our existing Snowflake connector to take advantage of this.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-blocked Progress cannot be made to this issue due to an outside blocking factor. backend changes needed Change must be implemented on the Snowflake service, and not in the client driver. labels Dec 11, 2024
@sfc-gh-dprzybysz sfc-gh-dprzybysz changed the title SNOW-840018 Allow reading Arrow record batch streams from result set SNOW-873466 Allow reading Arrow record batch streams from result set Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend changes needed Change must be implemented on the Snowflake service, and not in the client driver. feature status-blocked Progress cannot be made to this issue due to an outside blocking factor. status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

7 participants