Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-463: Improve TZ support for JDBC driver #464

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

aiguofer
Copy link
Contributor

@aiguofer aiguofer commented Dec 24, 2024

This PR adds support for natively fetching java.time.* objects through the JDBC driver.

DateVector

  • getObject(LocalDate.class)

DateTimeVector

  • getObject(OffsetDateTime.class)
  • getObject(LocalDateTime.class)
  • getObject(ZonedDateTime.class)
  • getObject(Instant.class)

TimeVector

  • getObject(LocalTime.class)

This PR also changes the behavior for vectors that include TZ info. These will now return as TIMESTAMP_WITH_TIMEZONE.

The behavior for different ways to access a TimeStampVector are as follows:

Vector with TZ Vector W/O TZ
getTimestamp() Get timestamp in the vector TZ Get timestamp in UTC
getTimestamp(calendar) Get timestamp by adjusting from the vector TZ to the desired calendar TZ Get timestamp by adjusting from UTC to the desired calendar TZ (a bug, IMO)
getObject(LocalDateTime.class) Get LocalDateTime by taking the timestamp at the vector TZ and taking the "wall-clock" time at that moment Treat the epoch value in the vector as the "wall-clock" time
getObject(Instant.class) Get Instant represented by the value in the vector TZ Get Instant represented by the value in UTC
getObject(OffsetDateTime.class) Get OffsetDateTime represented by the value in the vector TZ (will account for daylight adjustment) Get OffsetDateTime represented by the value in UTC (will account for daylight adjustment)
getObject(ZonedDateTime.class) Get ZonedDateTime represented by the value in the vector at in its TZ Get ZonedDateTime represented by the value in the vector at in UTC

Closes #463

@aiguofer aiguofer marked this pull request as draft December 24, 2024 00:56
@lidavidm lidavidm requested a review from jduo January 13, 2025 02:02
@@ -120,7 +120,12 @@ public static int getSqlTypeIdFromArrowType(ArrowType arrowType) {
case Time:
return Types.TIME;
case Timestamp:
return Types.TIMESTAMP;
String tz = ((ArrowType.Timestamp) arrowType).getTimezone();
if (tz != null){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

Copy link
Member

@linghengqian linghengqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdk install java 21.0.6-ms
sdk use java 21.0.6-ms

git clone [email protected]:aiguofer/arrow-java.git -b improved_tz_support
cd ./arrow/
mvn clean install -DskipTests -Dspotless.check.skip=true
cd ../

git clone [email protected]:linghengqian/influxdb-3-core-jdbc-test.git
cd ./influxdb-3-core-jdbc-test/
sdk use java 21.0.6-ms
./mvnw -T 1C -Dtest=TimeDifferenceTest clean test
Click me to view the error log of the unit test🥯🥨🍟🧂🥖🥚🍔🦪🍜🍘
$ ./mvnw -T 1C -Dtest=TimeDifferenceTest clean test
[INFO] Scanning for projects...
[INFO] 
[INFO] Using the MultiThreadedBuilder implementation with a thread count of 16
[INFO] 
[INFO] ----------< io.github.linghengqian:influxdb-3-core-jdbc-test >----------
[INFO] Building influxdb-3-core-jdbc-test 1.0-SNAPSHOT
[INFO]   from pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- clean:3.2.0:clean (default-clean) @ influxdb-3-core-jdbc-test ---
[INFO] Deleting /home/linghengqian/TwinklingLiftWorks/git/public/influxdb-3-core-jdbc-test/target
[INFO] 
[INFO] --- resources:3.3.1:resources (default-resources) @ influxdb-3-core-jdbc-test ---
[INFO] skip non existing resourceDirectory /home/linghengqian/TwinklingLiftWorks/git/public/influxdb-3-core-jdbc-test/src/main/resources
[INFO] 
[INFO] --- compiler:3.13.0:compile (default-compile) @ influxdb-3-core-jdbc-test ---
[INFO] No sources to compile
[INFO] 
[INFO] --- resources:3.3.1:testResources (default-testResources) @ influxdb-3-core-jdbc-test ---
[INFO] skip non existing resourceDirectory /home/linghengqian/TwinklingLiftWorks/git/public/influxdb-3-core-jdbc-test/src/test/resources
[INFO] 
[INFO] --- compiler:3.13.0:testCompile (default-testCompile) @ influxdb-3-core-jdbc-test ---
[INFO] Recompiling the module because of changed source code.
[INFO] Compiling 7 source files with javac [debug target 21] to target/test-classes
[INFO] 
[INFO] --- surefire:3.5.2:test (default-test) @ influxdb-3-core-jdbc-test ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running io.github.linghengqian.TimeDifferenceTest
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
2月 25, 2025 11:11:36 上午 org.apache.arrow.driver.jdbc.shaded.org.apache.arrow.memory.BaseAllocator <clinit>
信息: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
2月 25, 2025 11:11:36 上午 org.apache.arrow.driver.jdbc.shaded.org.apache.arrow.memory.DefaultAllocationManagerOption getDefaultAllocationManagerFactory
信息: allocation manager type not specified, using netty as the default type
2月 25, 2025 11:11:36 上午 org.apache.arrow.driver.jdbc.shaded.org.apache.arrow.memory.CheckAllocator reportResult
信息: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.419 s <<< FAILURE! -- in io.github.linghengqian.TimeDifferenceTest
[ERROR] io.github.linghengqian.TimeDifferenceTest.test -- Time elapsed: 3.354 s <<< FAILURE!
java.lang.AssertionError: 

Expected: is <2025-02-25T03:11:24.614425710Z>
     but: was <2025-02-24T19:11:24.614425710Z>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
        at io.github.linghengqian.TimeDifferenceTest.queryDataByJdbcDriver(TimeDifferenceTest.java:89)
        at io.github.linghengqian.TimeDifferenceTest.test(TimeDifferenceTest.java:50)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TimeDifferenceTest.test:50->queryDataByJdbcDriver:89 
Expected: is <2025-02-25T03:11:24.614425710Z>
     but: was <2025-02-24T19:11:24.614425710Z>
[INFO] 
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  4.894 s (Wall Clock)
[INFO] Finished at: 2025-02-25T11:11:38+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.5.2:test (default-test) on project influxdb-3-core-jdbc-test: There are test failures.
[ERROR] 
[ERROR] See /home/linghengqian/TwinklingLiftWorks/git/public/influxdb-3-core-jdbc-test/target/surefire-reports for the individual test results.
[ERROR] See dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
  • Another issue that is not resolved by the current PR is that the file is not formatted, resulting in the need for a command line argument of -Dspotless.check.skip=true.

@aiguofer
Copy link
Contributor Author

aiguofer commented Feb 25, 2025

@linghengqian I looked at your test but there's a few things that are not immediately clear.

  • Does Influx adjust the TZ for the Point based on its location? (that is, does it convert magicTime to London TZ?)
  • Which timestamp vector does it use for transport when querying time? does it include TZ info or not?

If you're in Shanghai TZ, it's possible the 8 hour difference you see is due to your local laptop timezone.

@lidavidm
Copy link
Member

Hopefully/presumably the Instant value should be correct, though? Although I see it's an Instant obtained from the java.sql.Timestamp which is already suspect (it's never been clear to me what the expectation is for the semantics of that class)

Copy link
Member

@linghengqian linghengqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Influx adjust the TZ for the Point based on its location? (that is, does it convert magicTime to London TZ?)

FWIW influxdb3 assumes the written time is UTC, and stores as UTC.

Which timestamp vector does it use for transport when querying time? does it include TZ info or not?

Hopefully/presumably the Instant value should be correct, though? Although I see it's an Instant obtained from the java.sql.Timestamp which is already suspect (it's never been clear to me what the expectation is for the semantics of that class)

  • @lidavidm Sorry, I made a cognitive error before. I forgot that java.sql.Timestamp#toInstant() will use the device's time zone information. I have updated the processing at linghengqian/influxdb-3-core-jdbc-test@489f4d7 and verified that the current PR handles java.time.Instant correctly. My WSL environment uses Asia/Shanghai time zone.

  • Overall, I think there are no issues with the current PR except for the unformatted files, and it would be great to see the current PR merged in the 19.0.0 milestone.

$ mvn clean install -DskipTests
[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.30.0:check (spotless-check) on project flight-sql-jdbc-core: The following files had format violations:

@aiguofer
Copy link
Contributor Author

Awesome! resultSet.getObject("time", Instant.class) is the right way to do it so I'm glad that's working. I'd like to also add some unit tests before finishing the PR but I'm pretty busy for the next month and a half.

I can easily make the formatting fixes for now and remove the draft status to get some more reviews. Depending on how others feel about unit tests, it would be great if someone can help contribute the tests.

@aiguofer aiguofer marked this pull request as ready for review February 26, 2025 20:26

This comment has been minimized.

@lidavidm lidavidm added the bug-fix PRs that fix a big. label Feb 26, 2025
@aiguofer
Copy link
Contributor Author

aiguofer commented Mar 2, 2025

Ok I tried to fix the existing tests but now I'm worried that the change to not modify the offset for Timestamp vectors w/o TZ info might have an inconsistent behavior with TimeVector accessors, DateVector accessors, and getTimestamp accessor for VarCharVector.

I also noticed that what we currently do for TimeStampVector is already inconsistent with what we do for TimeVector and DateVector accessors. For the latter, we use DateTimeUtils.applyCalendarOffset, which uses TimeZone.getDefault() to determine our current timezone. However, for TimeStampAccessor we assume UTC if not provided.

There's a lot of inter-related parts here and breaking changes could be troublesome.

I'm going to change behavior so that usage of the "legacy" JDBC date/time objects stays the same and we can simply recommend that users use getObject(<java.time.* class>) to get expected behaviors.

@aiguofer aiguofer force-pushed the improved_tz_support branch from ddd4bb3 to ee121e5 Compare March 2, 2025 23:17
@lidavidm
Copy link
Member

lidavidm commented Mar 3, 2025

I'll take a look. That sounds reasonable as a first step but we should probably fix things longer term if we can.

@laurentgo not sure if you have ideas on how the native JDBC datetime types are supposed to behave, I found the documentation rather under-specified...

@aiguofer
Copy link
Contributor Author

aiguofer commented Mar 3, 2025

Awesome thanks! I updated a few more things including a table in the PR description showing different behaviors for TimeStampVectors. We might want to cover all scenarios and add documentation.

I'm not sure how much more I'll be able to work on this over the next few weeks but I'll try to keep up with it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix PRs that fix a big.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[JDBC] Timezone improvements for the JDBC driver
4 participants