Do you have plans to support streaming in near future? Interested in readStream use-case spark.readStream.format("bigquery") #259

nmusku · 2020-10-28T17:39:23Z

If not how can I do it with current connector? Any thoughts?

davidrabinowitz · 2020-10-28T17:48:52Z

Streaming is on our roadmap, can you please elaborate more on your use case? Please feel free to contact us directly

nmusku · 2020-10-28T18:53:44Z

Hi we have data flowing directly into big-query(via fluentd) in real-time.
My use-case is to query/filter and transform that raw data to meaningful events using this spark-connector. The data ingested is based off timestamp so if there are delays in ingestion, I would like to go back in time (lets say: threshold of 15 minutes) as well and read the delayed data..Not sure how to achieve it via batch jobs : example:
spark.read.format("bigquery").option("filter", "start_time > current-5 minutes").option("filter", "end_time > current")
Might not work ^^

Note: The reads will be from view.

nmusku · 2020-10-29T23:55:48Z

@davidrabinowitz any thoughts? Is it possible to use any timestamp or any offset?

davidrabinowitz · 2020-10-30T19:24:58Z

@nmusku Yes, for the time being you can implement it with a query like you've suggested. BTW, you can also merge it:

spark.read.format("bigquery").option("filter", "start_time > current-5 minutes AND end_time > current")

nmusku · 2020-10-30T22:23:46Z

ok one more ques are the events in big query ordered?

rwagenmaker · 2021-04-02T10:10:20Z

Are there any news on this? now with GA4 would be cool to get streaming integration in spark

Magicbeanbuyer · 2022-02-25T13:31:56Z

Hi @davidrabinowitz ,

I am also interested in a readStream feature.

We have one ETL pipeline extracting campaign data from BigQuery and load data into our DeltaLake.

The struggle we face is to do incremental ETL without loading duplicated data into our deltalake. With readStream and checkpoint, hopefully this will be solved.

Could you maybe share more information on the timeline for readStream feature?

benney-au-le · 2022-04-13T04:10:53Z

We are also interested in this use case.
We land data in bigquery in real time from sources such as fivetran / fluent.d etc.
We would like to build spark streaming applications off by starting spark.readStream.format("bigquery") and trigger new micro-batches when new data arrives.

kaiseu · 2023-04-28T09:21:34Z

@davidrabinowitz any update on this topic? we're also interested in this.

davidrabinowitz · 2023-06-07T00:52:05Z

Can you please elaborate on the use case, especially how to want to read?

kaiseu · 2023-07-18T01:19:38Z

@davidrabinowitz our use case is streaming read the incremental data from bigquery tables, something like, spark.readStream.format("bigquery").option("inc_col", "create_time"), and we can config the incremental column, each time it will only read the newly added data. do we support this now? any suggestions?

davidrabinowitz added the question Further information is requested label Jun 7, 2023

isha97 added the enhancement New feature or request label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you have plans to support streaming in near future? Interested in readStream use-case spark.readStream.format("bigquery") #259

Do you have plans to support streaming in near future? Interested in readStream use-case spark.readStream.format("bigquery") #259

nmusku commented Oct 28, 2020 •

edited

Loading

davidrabinowitz commented Oct 28, 2020

nmusku commented Oct 28, 2020 •

edited

Loading

nmusku commented Oct 29, 2020

davidrabinowitz commented Oct 30, 2020

nmusku commented Oct 30, 2020 •

edited

Loading

rwagenmaker commented Apr 2, 2021

Magicbeanbuyer commented Feb 25, 2022

benney-au-le commented Apr 13, 2022

kaiseu commented Apr 28, 2023

davidrabinowitz commented Jun 7, 2023

kaiseu commented Jul 18, 2023

Do you have plans to support streaming in near future? Interested in readStream use-case spark.readStream.format("bigquery") #259

Do you have plans to support streaming in near future? Interested in readStream use-case spark.readStream.format("bigquery") #259

Comments

nmusku commented Oct 28, 2020 • edited Loading

davidrabinowitz commented Oct 28, 2020

nmusku commented Oct 28, 2020 • edited Loading

nmusku commented Oct 29, 2020

davidrabinowitz commented Oct 30, 2020

nmusku commented Oct 30, 2020 • edited Loading

rwagenmaker commented Apr 2, 2021

Magicbeanbuyer commented Feb 25, 2022

benney-au-le commented Apr 13, 2022

kaiseu commented Apr 28, 2023

davidrabinowitz commented Jun 7, 2023

kaiseu commented Jul 18, 2023

nmusku commented Oct 28, 2020 •

edited

Loading

nmusku commented Oct 28, 2020 •

edited

Loading

nmusku commented Oct 30, 2020 •

edited

Loading