You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, it appears that the way for a client using Snowpipe Streaming (via the Ingest SDK) to monitor what's actually been committed to a table (by which I mean "is queryable in Snowflake") is to use the SnowflakeStreamingIngestClient.getLatestCommittedOffsetTokens method (and its counterpart on the channel class).
Question: How often is it reasonable to poll this method? E.g. is every 10 seconds, with a list of a hundred or more channel names, too much?
Feature request: Provide a way for the SDK to give the client application a callback once a blob is registered and queryable in the table in Snowflake. The callback would contain the offset token committed alongside that blob.
This would enable the client application to implement some pretty interesting features without the additional lag (and additional load on Snowflake) imposed by the polling approach. Some uses-cases we have in mind to build on top of this in our application are:
Realtime monitoring of ingestion progress.
Wait for data to be available in one table before ingesting to another table. E.g. ensure that a parent table has a row queryable before inserting rows into a child table which would reference that parent table. Or to ensure that all rows pertaining to some operation are queryable before inserting a row stating that that operation is "complete".
When there is something which consumes from a Snowflake Stream defined on the table being ingested to, better align the time of when that consumer executes relative to when new data becomes available to that stream.
In terms of the API for this in the SDK, some options include:
Add to OpenChannelRequestBuilder a setCommittedOffsetTokenCallback(Consumer<String> callback, Executor executor) method.
If this SDK targets Java 9 or later, add to SnowflakeStreamingIngestChannel a Flow.Publisher<String> getCommittedOffsetTokensPublisher() method (see [1], [2], [3]).
One nice property about the linear nature of ingest channels is that only the latest offset token is interesting, so the callback publisher is free to drop offsets if the callback consumer is still processing a previous one, as long as it doesn't drop the latest one.
I'm admittedly assuming that the SDK actually knows once data is queryable in Snowflake as part of the blob registration service.
Hi @wesleyhillyext, thank you for the feature idea. Regarding your question, feel free to poll for the channel status whenever your business logic demands it. 10 seconds for 100 channels is perfectly fine.
Currently, it appears that the way for a client using Snowpipe Streaming (via the Ingest SDK) to monitor what's actually been committed to a table (by which I mean "is queryable in Snowflake") is to use the
SnowflakeStreamingIngestClient.getLatestCommittedOffsetTokens
method (and its counterpart on the channel class).Question: How often is it reasonable to poll this method? E.g. is every 10 seconds, with a list of a hundred or more channel names, too much?
Feature request: Provide a way for the SDK to give the client application a callback once a blob is registered and queryable in the table in Snowflake. The callback would contain the offset token committed alongside that blob.
This would enable the client application to implement some pretty interesting features without the additional lag (and additional load on Snowflake) imposed by the polling approach. Some uses-cases we have in mind to build on top of this in our application are:
In terms of the API for this in the SDK, some options include:
OpenChannelRequestBuilder
asetCommittedOffsetTokenCallback(Consumer<String> callback, Executor executor)
method.SnowflakeStreamingIngestChannel
aFlow.Publisher<String> getCommittedOffsetTokensPublisher()
method (see [1], [2], [3]).One nice property about the linear nature of ingest channels is that only the latest offset token is interesting, so the callback publisher is free to drop offsets if the callback consumer is still processing a previous one, as long as it doesn't drop the latest one.
I'm admittedly assuming that the SDK actually knows once data is queryable in Snowflake as part of the blob registration service.
[1] https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/Flow.Publisher.html
[2] The
SubmissionPublisher<T>
implementation of that: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/SubmissionPublisher.html[3] If targeting Java 8, could use the
org.reactivestreams
library forPublisher
instead.The text was updated successfully, but these errors were encountered: