Skip to content

Conversation

@raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Feb 24, 2025

Description

Changes JDBC connectors to produce pages rather a cursor.
This allows JDBC connectors to benefit from dynamic row filtering and columnar filter evaluation optimizations.
We're also able to use io.trino.plugin.jdbc.JdbcPageSource#isBlocked to yield when waiting for results from source DB.
RemoteQueryCancellation is removed because all jdbc queries are now executed
on a separate thread pool which allows remote query cancellation.

Additional context and related issues

I made a JMH benchmark at raunaqmorarka@6e015db
This assesses the performance of a highly selective filter (~1% selectivity) on a varchar column that is in front of another 7 varchar columns.

BenchmarkScanFilterAndProjectOperator.benchmarkColumnOriented
(columnCount)  (columnarFilterEvaluationEnabled)  (positionsPerPage)   (type)  (useRecordCursor)   Mode  Cnt    Score   Error  Units
            8                              false                1024  varchar               true  thrpt   20   90.872 ± 1.197  ops/s
            8                              false                1024  varchar              false  thrpt   20  113.378 ± 1.033  ops/s
            8                               true                1024  varchar              false  thrpt   20  140.283 ± 0.835  ops/s\

The cursor interface version is worse both with and without columnar filter evaluation enabled.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## ClickHouse, Druid, DuckDb, Exasol, Ignite, MariaDb, MySql, Oracle, Postgresql, Redshift, Singlestore, SnowFlake, SQL server, Vertica 
* Improve performance of selective joins for federated queries. ({issue}`25123`)

@cla-bot cla-bot bot added the cla-signed label Feb 24, 2025
@raunaqmorarka raunaqmorarka force-pushed the raunaq/jdbc-page branch 5 times, most recently from 5cc9309 to d35009f Compare February 25, 2025 08:35
@github-actions github-actions bot added the redshift Redshift connector label Feb 25, 2025
@raunaqmorarka
Copy link
Member Author

raunaqmorarka commented Mar 3, 2025

postgresql unpartitioned sf10.pdf
postgresql partitioned sf10.pdf

Screenshot 2025-04-24 at 12 38 48 AM Screenshot 2025-04-24 at 12 39 52 AM

TPCH is mostly the same, TPCDS has some gains

sqlException.setNextException(cause.getNextException());
}
throw sqlException;
resultSet = requireNonNull(getFutureValue(resultSetFuture), "resultSet is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is SQLException failure, there may be a little difference.

if (e.getCause() instanceof SQLException cause) {
    SQLException sqlException = new SQLException(cause.getMessage(), cause.getSQLState(), cause.getErrorCode(), e);
    if (cause.getNextException() != null) {
        sqlException.setNextException(cause.getNextException());
    }
    throw sqlException;
}

@raunaqmorarka raunaqmorarka force-pushed the raunaq/jdbc-page branch 4 times, most recently from 538888c to d4acf20 Compare April 14, 2025 18:22
@github-actions github-actions bot added the postgresql PostgreSQL connector label Apr 14, 2025
@raunaqmorarka raunaqmorarka force-pushed the raunaq/jdbc-page branch 5 times, most recently from 56b1234 to 7ca021e Compare April 21, 2025 13:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates JDBC connectors from a recordset-based implementation to the new page source API in order to enable dynamic row filtering and improved columnar filter evaluation optimizations. Key changes include:

  • Migrating connector and split manager classes to use ConnectorPageSourceProvider instead of ConnectorRecordSetProvider.
  • Updating tests to iterate over SourcePage rather than using a RecordCursor.
  • Removing RemoteQueryCancellation modules and related test files.

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
plugin/trino-redshift/src/main/java/io/trino/plugin/redshift/RedshiftUnloadConnector.java Updated constructor parameter to use jdbcPageSourceProvider.
plugin/trino-redshift/src/main/java/io/trino/plugin/redshift/RedshiftSplitManager.java Changed annotation from @ForRecordCursor to @ForJdbcClient for the executor.
plugin/trino-redshift/src/main/java/io/trino/plugin/redshift/RedshiftPageSourceProvider.java Replaced RecordPageSource with jdbcPageSourceProvider.createPageSource.
plugin/trino-redshift/src/main/java/io/trino/plugin/redshift/RedshiftClientModule.java Removed RemoteQueryCancellationModule installation.
plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClientModule.java Removed RemoteQueryCancellationModule installation.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSourceProvider.java Updated test methods to use page source and adjusted iteration accordingly.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSource.java Updated tests to use JdbcPageSource and Surface new API validated through SourcePage iteration.
Others (modules, configurations, removed files) Cleaned up RemoteQueryCancellation and deprecated recordset implementations.
Comments suppressed due to low confidence (2)

plugin/trino-redshift/src/main/java/io/trino/plugin/redshift/RedshiftPageSourceProvider.java:69

  • The error message still refers to 'recordSetProvider' instead of 'jdbcPageSourceProvider'. Please update the message accordingly.
this.jdbcPageSourceProvider = requireNonNull(jdbcPageSourceProvider, "recordSetProvider is null");

plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSourceProvider.java:110

  • The failure message should be updated to reference 'pageSource' instead of 'recordSet' to reflect the new API.
assertThat(pageSource).withFailMessage("recordSet is null").isNotNull();

This enables dynamic row filtering and columnar filter evaluation optimizations
on JDBC based sources.
We're also able to use io.trino.plugin.jdbc.JdbcPageSource#isBlocked to yield
when waiting for results from source DB.
RemoteQueryCancellation is removed because all jdbc queries are now executed
on a separate thread pool which allows remote query cancellation.
@wendigo
Copy link
Contributor

wendigo commented Apr 23, 2025

@raunaqmorarka Are benchmarks ok now?

@raunaqmorarka
Copy link
Member Author

@raunaqmorarka Are benchmarks ok now?

Yes, i've updated my initial results, there aren't significant regressions any more

@raunaqmorarka raunaqmorarka changed the title Migrate JDBC to page source API Migrate JDBC plugins to page source API Apr 23, 2025
@raunaqmorarka raunaqmorarka merged commit 182744d into master Apr 24, 2025
71 checks passed
@raunaqmorarka raunaqmorarka deleted the raunaq/jdbc-page branch April 24, 2025 02:19
@github-actions github-actions bot added this to the 476 milestone Apr 24, 2025
@sajjoseph
Copy link
Contributor

@sourcery-ai review

@sourcery-ai
Copy link

sourcery-ai bot commented Nov 2, 2025

Hi @sajjoseph! 👋

Only authors and team members can run @sourcery-ai commands on public repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed phoenix5 postgresql PostgreSQL connector redshift Redshift connector

Development

Successfully merging this pull request may close these issues.

6 participants