Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Allow to use Spark Session in DataExporter #363

Open
rafaelcanto opened this issue Nov 25, 2024 · 4 comments
Open

[Proposal] Allow to use Spark Session in DataExporter #363

rafaelcanto opened this issue Nov 25, 2024 · 4 comments

Comments

@rafaelcanto
Copy link

Is your feature request related to a problem? Please describe.
We need to scale our data ingestion and would be nice to have support to Spark when writing the outputs from DataExporter. When using the pyarrow, we also noted the number of parquet files grows and with maybe would be good also to allow to control this as a parameter.

Describe the solution you'd like
We would like to allow optionally passing a spark session instance to FocusConverter class to be passed down to DataExporter.

Describe alternatives you've considered
We're considering to fork the repository itself, but maybe the solution is needed by others. We didn't find other options to speedup the convertion. Any ideas are welcome!

@rafaelcanto
Copy link
Author

If you don't mind, I can develope the proposal and send it as a PR. Let me know if this make any sense.

@rafaelcanto
Copy link
Author

Hi! Anybody here?

@varunmittal91
Copy link
Collaborator

Hi @rafaelcanto that sounds great please let me know if there is anything I can help with.

@rafaelcanto
Copy link
Author

Good! Thanks. I'll start the development this week and I'll keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants