[Proposal] Allow to use Spark Session in DataExporter #363

rafaelcanto · 2024-11-25T00:51:01Z

Is your feature request related to a problem? Please describe.
We need to scale our data ingestion and would be nice to have support to Spark when writing the outputs from DataExporter. When using the pyarrow, we also noted the number of parquet files grows and with maybe would be good also to allow to control this as a parameter.

Describe the solution you'd like
We would like to allow optionally passing a spark session instance to FocusConverter class to be passed down to DataExporter.

Describe alternatives you've considered
We're considering to fork the repository itself, but maybe the solution is needed by others. We didn't find other options to speedup the convertion. Any ideas are welcome!

rafaelcanto · 2024-11-25T00:52:26Z

If you don't mind, I can develope the proposal and send it as a PR. Let me know if this make any sense.

rafaelcanto · 2024-11-29T00:44:58Z

Hi! Anybody here?

varunmittal91 · 2024-11-29T02:18:43Z

Hi @rafaelcanto that sounds great please let me know if there is anything I can help with.

rafaelcanto · 2024-11-29T03:28:42Z

Good! Thanks. I'll start the development this week and I'll keep you posted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Allow to use Spark Session in DataExporter #363

[Proposal] Allow to use Spark Session in DataExporter #363

rafaelcanto commented Nov 25, 2024

rafaelcanto commented Nov 25, 2024

rafaelcanto commented Nov 29, 2024

varunmittal91 commented Nov 29, 2024

rafaelcanto commented Nov 29, 2024

[Proposal] Allow to use Spark Session in DataExporter #363

[Proposal] Allow to use Spark Session in DataExporter #363

Comments

rafaelcanto commented Nov 25, 2024

rafaelcanto commented Nov 25, 2024

rafaelcanto commented Nov 29, 2024

varunmittal91 commented Nov 29, 2024

rafaelcanto commented Nov 29, 2024