You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have developed my own connector, and in certain scenarios, my tables contain over 10,000 columns (up to hundreds of thousands).
In SPI interfaces such as getTableMetadata, I am compelled to return all column metadata. I understand that the analyzer and planner require this information, but in practice, the performance overhead during the analyze and plan stages is substantial, involving numerous collection traversals and immutable collection copies.
For instance, for a query involving a table with approximately 30,000 columns, the total query response time is around 3 seconds, with the analyze and plan stages consuming over 2 seconds (especially the optimizer).
Currently, I have attempted to adjust the order of PlanOptimizers, adding columnPruningOptimizer earlier in the process, but the improvement was not significant. Subsequently, I had to inspect the AST before the analyzer and gather field references to reduce the number of columns returned by interfaces like getTableMetadata. This approach works but is not elegant. I am curious if the community has any plans to optimize the planning process for such ultra-wide tables.
The text was updated successfully, but these errors were encountered:
Trino Version: 464.
I have developed my own connector, and in certain scenarios, my tables contain over 10,000 columns (up to hundreds of thousands).
In SPI interfaces such as getTableMetadata, I am compelled to return all column metadata. I understand that the analyzer and planner require this information, but in practice, the performance overhead during the analyze and plan stages is substantial, involving numerous collection traversals and immutable collection copies.
For instance, for a query involving a table with approximately 30,000 columns, the total query response time is around 3 seconds, with the analyze and plan stages consuming over 2 seconds (especially the optimizer).
Currently, I have attempted to adjust the order of PlanOptimizers, adding columnPruningOptimizer earlier in the process, but the improvement was not significant. Subsequently, I had to inspect the AST before the analyzer and gather field references to reduce the number of columns returned by interfaces like getTableMetadata. This approach works but is not elegant. I am curious if the community has any plans to optimize the planning process for such ultra-wide tables.
The text was updated successfully, but these errors were encountered: