-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support avro timestamp-nanos logical type #27546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support avro timestamp-nanos logical type #27546
Conversation
...formats/src/main/java/io/trino/hive/formats/avro/NativeLogicalTypesAvroTypeBlockHandler.java
Outdated
Show resolved
Hide resolved
ccf797c to
e303348
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for the timestamp-nanos logical type to the native Avro type management system, following the Avro 1.12 specification. The change extends NativeLogicalTypesAvroTypeBlockHandler while maintaining backward compatibility by keeping the Hive connector's behavior unchanged (it continues to ignore timestamp-micros and timestamp-nanos logical types).
- Added
TimestampNanosLogicalTyperecord class and associated constants - Implemented encoding/decoding logic for nanosecond-precision timestamps using
LongTimestamp - Added comprehensive test coverage including edge cases with negative epoch timestamps
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/avro/model/AvroLogicalType.java |
Added TimestampNanosLogicalType record and TIMESTAMP_NANOS/LOCAL_TIMESTAMP_NANOS constants with parsing logic |
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/avro/NativeLogicalTypesAvroTypeManager.java |
Added TIMESTAMP_NANOS_SCHEMA, type mapping, and bidirectional conversion functions between Trino timestamps and Avro nanosecond timestamps |
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/avro/NativeLogicalTypesAvroTypeBlockHandler.java |
Implemented TimestampNanosBlockBuildingDecoder to decode nanosecond timestamps into LongTimestamp blocks |
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/avro/HiveAvroTypeBlockHandler.java |
Updated pattern matching to ignore TimestampNanosLogicalType, maintaining existing Hive connector behavior |
lib/trino-hive-formats/src/test/java/io/trino/hive/formats/avro/TestAvroPageDataReaderWithAvroNativeTypeManagement.java |
Added timestamp-nanos to existing tests and new comprehensive test for negative epoch timestamps, plus incidental bug fix for timestamp-micros test data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| yield (block, position) -> | ||
| { | ||
| SqlTimestamp timestamp = ((SqlTimestamp) timestampType.getObjectValue(block, position)); | ||
| return (timestamp.getEpochMicros() * NANOSECONDS_PER_MICROSECOND) + (timestamp.getPicosOfMicros() / PICOSECONDS_PER_NANOSECOND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guard for overflow.
add a test for 9999-12-31 timestamps (surprisingly popular in real world data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guarded for overflow and added test for longest representable timestamp nanos.
| try (AvroFileWriter fileWriter = new AvroFileWriter( | ||
| trinoLocalFilesystem.newOutputFile(testLocation).create(), | ||
| timestampsSchema, | ||
| new NativeLogicalTypesAvroTypeManager(), | ||
| AvroCompressionKind.NULL, | ||
| ImmutableMap.of(), | ||
| timestampsSchema.getFields().stream().map(Schema.Field::name).collect(toImmutableList()), | ||
| new NativeLogicalTypesAvroTypeBlockHandler().typeFor(timestampsSchema).getTypeParameters(), false)) { | ||
| fileWriter.write(timestampsPage); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be the only test. NativeLogicalTypesAvroTypeManager is our class and its logic may or may not be entirely correct.
There should be a test with hive writing the file. See example existing test
Lines 52 to 53 in 16b41f6
| "TBLPROPERTIES ('avro.schema.literal'='%s')", | |
| SCHEMA_LITERAL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added hive connector product test, note the test indicates no change in behavior for hive connector
e303348 to
b223b3c
Compare
b223b3c to
e4a37fb
Compare
Only support in native block handler, not hive block handler
e4a37fb to
86fe3b5
Compare
Description
Add support for
timestamp-nanostoNativeLogicalTypesAvroTypeBlockHandler.Note this does not change the behavior of the hive connector,
HiveAvroTypeBlockHandlerstill only supportstimestamp-millis.timestamp-microsandtimestamp-nanoslogical types will continue to be ignored.Additional context and related issues
Avro 1.12 spec added
timestamp-nanosandlocal-timestamp-nanoslogical types: https://avro.apache.org/docs/1.12.0/specification/Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Change should not be visible by users, hive connector behavior is unchanged.