Skip to content

Conversation

@vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Nov 13, 2025

What changes were proposed in this pull request?

Add TIME type support to the Spark Connect JDBC client

Why are the changes needed?

TIME is a fundamental SQL type required for JDBC compliance and interoperability.

Note :

After this PR, there is Precision Loss on SparkConnectResultSet.getTime()

Overview
When using ResultSet.getTime() to retrieve TIME values from Spark Connect JDBC, there is a precision loss from microseconds to milliseconds.

The Issue

  • Spark SQL Support: TIME values with microsecond precision (6 decimal places)
    • Example: time '12:34:56.123456'
  • JDBC Limitation: java.sql.Time class only supports millisecond precision (3 decimal places)
  • Result: When calling getTime(), the last 3 digits (microseconds) are truncated

Example

// Input: time '12:34:56.123456' (microseconds: 123456)
val time = rs.getTime(1)
// Output: time.getTime() = 45296123 (only 123 milliseconds preserved)
// 12 hours = 12 × 3,600,000 ms = 43,200,000 ms
// 34 minutes = 34 × 60,000 ms = 2,040,000 ms
// 56 seconds = 56 × 1,000 ms = 56,000 ms
// 123 milliseconds = 123 ms
// Total = 45,296,123 ms
// Lost: 456 microseconds (the last 3 digits)

Root Cause
This is a fundamental limitation of the java.sql.Time class, which internally stores time as milliseconds rather than microseconds.

Does this PR introduce any user-facing change?

Yes, it's part of a new feature under Spark connect JDBC support.

How was this patch tested?

Added new UTs in SparkConnectJdbcDataTypeSuite
The tests verify that at least the millisecond precision (3 decimal places) is correctly preserved, but users should be aware that any sub-millisecond precision in the original data will be lost when using getTime().

Was this patch authored or co-authored using generative AI tooling?

No

@vinodkc vinodkc changed the title [SPARK-54208][CONNECT] Support TIME type data in SparkConnectResultSet [SPARK-54208][CONNECT] Support TIME type in SparkConnectResultSet Nov 13, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the compatibility with Spark Thrift Server behavior, @vinodkc ?

When using ResultSet.getTime() to retrieve TIME values from Spark Connect JDBC, there is a precision loss from microseconds to milliseconds.

@vinodkc
Copy link
Contributor Author

vinodkc commented Nov 13, 2025

Hi @dongjoon-hyun , Please see my reply

What about the compatibility with Spark Thrift Server behavior,

It seems ,STS doesn't support getTime() method , I see this error from STS client

java.sql.SQLFeatureNotSupportedException: Method not supported
  at org.apache.hive.jdbc.HiveBaseResultSet.getTime(HiveBaseResultSet.java:552)

@dongjoon-hyun
Copy link
Member

Got it~

cc @pan3793 , @LuciferYang

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pan3793
Copy link
Member

pan3793 commented Nov 13, 2025

I think precision loss is expected behavior due to the limitation of the JDBC specification, the current implementation is fine.

Hive JDBC driver doesn't support getTime() method as of the latest 4.2.0 RC0

https://github.com/apache/hive/blob/rel/release-4.2.0-rc0/jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java#L679

So for STS cases, we can't do E2E test even though we support it on the server side. BTW, STS converts some type of data into a String on the server side before returning to the client, so it may have no precision loss in display.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @vinodkc , @pan3793 , @LuciferYang .

dongjoon-hyun pushed a commit that referenced this pull request Nov 13, 2025
### What changes were proposed in this pull request?

Add TIME type support to the Spark Connect JDBC client

### Why are the changes needed?

TIME is a fundamental SQL type required for JDBC compliance and interoperability.

### Note :
#### After this PR, there is Precision Loss on `SparkConnectResultSet.getTime()`

**Overview**
When using `ResultSet.getTime()` to retrieve TIME values from Spark Connect JDBC, there is a **precision loss from microseconds to milliseconds**.

**The Issue**
- **Spark SQL Support**: TIME values with microsecond precision (6 decimal places)
  - Example: `time '12:34:56.123456'`
- **JDBC Limitation**: `java.sql.Time` class only supports millisecond precision (3 decimal places)
- **Result**: When calling `getTime()`, the last 3 digits (microseconds) are truncated

**Example**

```scala
// Input: time '12:34:56.123456' (microseconds: 123456)
val time = rs.getTime(1)
// Output: time.getTime() = 45296123 (only 123 milliseconds preserved)
// 12 hours = 12 × 3,600,000 ms = 43,200,000 ms
// 34 minutes = 34 × 60,000 ms = 2,040,000 ms
// 56 seconds = 56 × 1,000 ms = 56,000 ms
// 123 milliseconds = 123 ms
// Total = 45,296,123 ms
// Lost: 456 microseconds (the last 3 digits)
```

**Root Cause**
This is a fundamental limitation of the `java.sql.Time` class, which internally stores time as milliseconds rather than microseconds.

### Does this PR introduce _any_ user-facing change?

Yes, it's part of a new feature under Spark connect JDBC support.

### How was this patch tested?

Added new UTs in  `SparkConnectJdbcDataTypeSuite`
The tests verify that at least the millisecond precision (3 decimal places) is correctly preserved, but users should be aware that any sub-millisecond precision in the original data will be lost when using `getTime()`.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #53026 from vinodkc/br_SPARK-54208.

Authored-by: vinodkc <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 2a6e6e5)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to master/4.1 for Apache Spark 4.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants