-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support read inCommitTimestamp in Delta Lake history table #25056
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test with a static resource? I guess the connector can't read such tables because inCommitTimestamps
reader feature is unsupported.
Just tested, Trino seems can read such table, for the reader seems no constraints? |
You are right. It is writer only feature: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps |
45f09c4
to
70bd4c5
Compare
...src/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointFieldReader.java
Outdated
Show resolved
Hide resolved
70bd4c5
to
78b9e75
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeHistoryTable.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeHistoryTable.java
Outdated
Show resolved
Hide resolved
78b9e75
to
c01f371
Compare
25cd260
to
8bce054
Compare
109411e
to
c568122
Compare
@raunaqmorarka Would you like to have a look again? |
commitInfoEntries.forEach(commitInfoEntry -> { | ||
pagesBuilder.beginRow(); | ||
|
||
pagesBuilder.appendBigint(commitInfoEntry.version()); | ||
pagesBuilder.appendTimestampTzMillis(commitInfoEntry.timestamp(), timeZoneKey); | ||
commitInfoEntry.inCommitTimestamp().ifPresentOrElse( | ||
// use `inCommitTimestamp` if table In-Commit timestamps enabled, otherwise use file modification timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really fallback to "file modification timestamp"? Same for delta-lake.md
.
@@ -438,6 +438,7 @@ private DeltaLakeTransactionLogEntry buildCommitInfoEntry(ConnectorSession sessi | |||
|
|||
CommitInfoEntry result = new CommitInfoEntry( | |||
commitInfo.getLong("version"), | |||
OptionalLong.empty(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe code readers can't understand why this logic returns OptionalLong.empty()
. I would recommend supporting the field in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know to write the commitInfo
in checkpoint
file?
Current the whole method buildCommitInfoEntry
is not covered by test or I am missing somewhere?
I see blow logic always is hit in the buildCommitInfoEntry
in current tests.
if (block.isNull(pagePosition)) {
return null;
}
// The first two versions commitInfo doesn't contain `inCommitTimestamp`, the value is read from `timestamp` in commitInfo | ||
// The last two versions commitInfo contain `inCommitTimestamp`, the value is read from it. | ||
assertQuery("SELECT date_diff('millisecond', TIMESTAMP '1970-01-01 00:00:00 UTC', timestamp) FROM \"%s$history\"".formatted(tableName), "VALUES 1739859668531L, 1739859684775L, 1739859743394L, 1739859755480L"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr here shows the "fallback" to timestamp
in first two versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenjian2664 I'm not asking fallback to timestamp
field. I'm asking if this PR fallback to "file modification timestamp" as you left a code comment. Does this assertion fail if we change file modification time of static resource? I believe the answer is no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I think there is a gap about the "file modification timestamp", here what I refer "file modification timestamp" is -> timestamp
field, while guess you are consider of the "creation/modification time of the metadata/log entry file"?
If so I would update the comment to "... read the timestamp
", since it looks misleading
I use the "file modification timestamp" is refer from the https://github.com/delta-io/delta/blob/master/PROTOCOL.md#recommendations-for-readers-of-tables-with-in-commit-timestamps, which for readers it is said :
readers can use the following rules:
1. For commits with version >= delta.inCommitTimestampEnablementVersion, readers should use the inCommitTimestamp field of the commitInfo action.
2. For commits with version < delta.inCommitTimestampEnablementVersion, readers should use the file modification timestamp.
In addition, adjusted the `timestamp` field to return UTC timezone instead of the user session timezone
c568122
to
a018c2a
Compare
Release notes