Analyze and Planning Stages Are Extremely Slow When Tables Have Excessive Columns #25018

dc-orz · 2025-02-14T03:50:40Z

Trino Version: 464.

I have developed my own connector, and in certain scenarios, my tables contain over 10,000 columns (up to hundreds of thousands).

In SPI interfaces such as getTableMetadata, I am compelled to return all column metadata. I understand that the analyzer and planner require this information, but in practice, the performance overhead during the analyze and plan stages is substantial, involving numerous collection traversals and immutable collection copies.

For instance, for a query involving a table with approximately 30,000 columns, the total query response time is around 3 seconds, with the analyze and plan stages consuming over 2 seconds (especially the optimizer).

Currently, I have attempted to adjust the order of PlanOptimizers, adding columnPruningOptimizer earlier in the process, but the improvement was not significant. Subsequently, I had to inspect the AST before the analyzer and gather field references to reduce the number of columns returned by interfaces like getTableMetadata. This approach works but is not elegant. I am curious if the community has any plans to optimize the planning process for such ultra-wide tables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze and Planning Stages Are Extremely Slow When Tables Have Excessive Columns #25018

Analyze and Planning Stages Are Extremely Slow When Tables Have Excessive Columns #25018

dc-orz commented Feb 14, 2025

Analyze and Planning Stages Are Extremely Slow When Tables Have Excessive Columns #25018

Analyze and Planning Stages Are Extremely Slow When Tables Have Excessive Columns #25018

Comments

dc-orz commented Feb 14, 2025