Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE][SanityTest] Support parquet format data #736

Open
LantaoJin opened this issue Oct 3, 2024 · 0 comments
Open

[FEATURE][SanityTest] Support parquet format data #736

LantaoJin opened this issue Oct 3, 2024 · 0 comments
Labels
enhancement New feature or request Lang:PPL Pipe Processing Language support untriaged

Comments

@LantaoJin
Copy link
Member

Is your feature request related to a problem?
In Sanity the Testing, we only test for the JSON format data, and each query will scan 1045 JSON files, that is the primary slowness during Spark execution (~90% time spent on file scan). Can parquet file format be able to use? It could be much and much faster. We didn't test other data format cases.

What solution would you like?
Not sure it is supported or not. Close it if this is already supported.

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@LantaoJin LantaoJin added enhancement New feature or request untriaged labels Oct 3, 2024
@LantaoJin LantaoJin changed the title [FEATURE] Support parquet format data [FEATURE][SanityTest] Support parquet format data Oct 3, 2024
@YANG-DB YANG-DB added the Lang:PPL Pipe Processing Language support label Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Lang:PPL Pipe Processing Language support untriaged
Projects
None yet
Development

No branches or pull requests

2 participants