Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze and Planning Stages Are Extremely Slow When Tables Have Excessive Columns #25018

Open
dc-orz opened this issue Feb 14, 2025 · 0 comments

Comments

@dc-orz
Copy link

dc-orz commented Feb 14, 2025

Trino Version: 464.

I have developed my own connector, and in certain scenarios, my tables contain over 10,000 columns (up to hundreds of thousands).

In SPI interfaces such as getTableMetadata, I am compelled to return all column metadata. I understand that the analyzer and planner require this information, but in practice, the performance overhead during the analyze and plan stages is substantial, involving numerous collection traversals and immutable collection copies.

For instance, for a query involving a table with approximately 30,000 columns, the total query response time is around 3 seconds, with the analyze and plan stages consuming over 2 seconds (especially the optimizer).

Currently, I have attempted to adjust the order of PlanOptimizers, adding columnPruningOptimizer earlier in the process, but the improvement was not significant. Subsequently, I had to inspect the AST before the analyzer and gather field references to reduce the number of columns returned by interfaces like getTableMetadata. This approach works but is not elegant. I am curious if the community has any plans to optimize the planning process for such ultra-wide tables.

Image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant