-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54208][CONNECT] Support TIME type in SparkConnectResultSet #53026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.../src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the compatibility with Spark Thrift Server behavior, @vinodkc ?
When using ResultSet.getTime() to retrieve TIME values from Spark Connect JDBC, there is a precision loss from microseconds to milliseconds.
|
Hi @dongjoon-hyun , Please see my reply
It seems ,STS doesn't support getTime() method , I see this error from STS client |
|
Got it~ cc @pan3793 , @LuciferYang |
pan3793
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I think precision loss is expected behavior due to the limitation of the JDBC specification, the current implementation is fine. Hive JDBC driver doesn't support So for STS cases, we can't do E2E test even though we support it on the server side. BTW, STS converts some type of data into a String on the server side before returning to the client, so it may have no precision loss in display. |
...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala
Show resolved
Hide resolved
.../src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala
Outdated
Show resolved
Hide resolved
...ent/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectResultSet.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @vinodkc , @pan3793 , @LuciferYang .
### What changes were proposed in this pull request? Add TIME type support to the Spark Connect JDBC client ### Why are the changes needed? TIME is a fundamental SQL type required for JDBC compliance and interoperability. ### Note : #### After this PR, there is Precision Loss on `SparkConnectResultSet.getTime()` **Overview** When using `ResultSet.getTime()` to retrieve TIME values from Spark Connect JDBC, there is a **precision loss from microseconds to milliseconds**. **The Issue** - **Spark SQL Support**: TIME values with microsecond precision (6 decimal places) - Example: `time '12:34:56.123456'` - **JDBC Limitation**: `java.sql.Time` class only supports millisecond precision (3 decimal places) - **Result**: When calling `getTime()`, the last 3 digits (microseconds) are truncated **Example** ```scala // Input: time '12:34:56.123456' (microseconds: 123456) val time = rs.getTime(1) // Output: time.getTime() = 45296123 (only 123 milliseconds preserved) // 12 hours = 12 × 3,600,000 ms = 43,200,000 ms // 34 minutes = 34 × 60,000 ms = 2,040,000 ms // 56 seconds = 56 × 1,000 ms = 56,000 ms // 123 milliseconds = 123 ms // Total = 45,296,123 ms // Lost: 456 microseconds (the last 3 digits) ``` **Root Cause** This is a fundamental limitation of the `java.sql.Time` class, which internally stores time as milliseconds rather than microseconds. ### Does this PR introduce _any_ user-facing change? Yes, it's part of a new feature under Spark connect JDBC support. ### How was this patch tested? Added new UTs in `SparkConnectJdbcDataTypeSuite` The tests verify that at least the millisecond precision (3 decimal places) is correctly preserved, but users should be aware that any sub-millisecond precision in the original data will be lost when using `getTime()`. ### Was this patch authored or co-authored using generative AI tooling? No Closes #53026 from vinodkc/br_SPARK-54208. Authored-by: vinodkc <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 2a6e6e5) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to master/4.1 for Apache Spark 4.1.0. |
What changes were proposed in this pull request?
Add TIME type support to the Spark Connect JDBC client
Why are the changes needed?
TIME is a fundamental SQL type required for JDBC compliance and interoperability.
Note :
After this PR, there is Precision Loss on
SparkConnectResultSet.getTime()Overview
When using
ResultSet.getTime()to retrieve TIME values from Spark Connect JDBC, there is a precision loss from microseconds to milliseconds.The Issue
time '12:34:56.123456'java.sql.Timeclass only supports millisecond precision (3 decimal places)getTime(), the last 3 digits (microseconds) are truncatedExample
Root Cause
This is a fundamental limitation of the
java.sql.Timeclass, which internally stores time as milliseconds rather than microseconds.Does this PR introduce any user-facing change?
Yes, it's part of a new feature under Spark connect JDBC support.
How was this patch tested?
Added new UTs in
SparkConnectJdbcDataTypeSuiteThe tests verify that at least the millisecond precision (3 decimal places) is correctly preserved, but users should be aware that any sub-millisecond precision in the original data will be lost when using
getTime().Was this patch authored or co-authored using generative AI tooling?
No