Skip to content
This repository was archived by the owner on Jul 31, 2023. It is now read-only.

Commit 2b10d70

Browse files
MaxGekkcloud-fan
authored andcommitted
[SPARK-31423][SQL] Fix rebasing of not-existed dates
### What changes were proposed in this pull request? In the PR, I propose to change rebasing of not-existed dates in the hybrid calendar (Julian + Gregorian since 1582-10-15) in the range (1582-10-04, 1582-10-15). Not existed dates from the range are shifted to the first valid date in the hybrid calendar - 1582-10-15. The changes affect only `rebaseGregorianToJulianDays()` because reverse rebasing from the hybrid dates to Proleptic Gregorian dates does not have such problem. ### Why are the changes needed? Currently, not-existed dates are shifted by standard difference between Julian and Gregorian calendar on 1582-10-04, for example 1582-10-14 -> 1582-10-24. That's contradict to shifting not existed dates in other cases, for example: ``` scala> sql("select date'1990-9-31'").show +-----------------+ |DATE '1990-10-01'| +-----------------+ | 1990-10-01| +-----------------+ ``` ### Does this PR introduce any user-facing change? Yes, this impacts on conversion of Spark SQL `DATE` values to external dates based on non-Proleptic Gregorian calendar. For example, while saving the 1582-10-14 date to ORC files, it will be shifted to the next valid date 1582-10-15. ### How was this patch tested? - Added tests to `RebaseDateTimeSuite` and to `OrcSourceSuite` - By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, `DateExpressionsSuite`, `CollectionExpressionsSuite`, `ParquetIOSuite`. Closes apache#28225 from MaxGekk/fix-not-exist-dates. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 7699f76 commit 2b10d70

File tree

3 files changed

+39
-7
lines changed

3 files changed

+39
-7
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala

+12-4
Original file line numberDiff line numberDiff line change
@@ -131,21 +131,26 @@ object RebaseDateTime {
131131
// The differences in days between Proleptic Gregorian and Julian dates.
132132
// The diff at the index `i` is applicable for all days in the date interval:
133133
// [gregJulianDiffSwitchDay(i), gregJulianDiffSwitchDay(i+1))
134-
private val gregJulianDiffs = Array(-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0)
134+
private val gregJulianDiffs = Array(
135+
-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
135136
// The sorted days in Proleptic Gregorian calendar when difference in days between
136137
// Proleptic Gregorian and Julian was changed.
137138
// The starting point is the `0001-01-01` (-719162 days since the epoch in
138139
// Proleptic Gregorian calendar). This array is not applicable for dates before the staring point.
139140
// Rebasing switch days and diffs `gregJulianDiffSwitchDay` and `gregJulianDiffs`
140141
// was generated by the `localRebaseGregorianToJulianDays` function.
141142
private val gregJulianDiffSwitchDay = Array(
142-
-719162, -682944, -646420, -609896, -536847, -500323, -463799,
143-
-390750, -354226, -317702, -244653, -208129, -171605, -141427)
143+
-719162, -682944, -646420, -609896, -536847, -500323, -463799, -390750,
144+
-354226, -317702, -244653, -208129, -171605, -141436, -141435, -141434,
145+
-141433, -141432, -141431, -141430, -141429, -141428, -141427)
144146

145147
// The first days of Common Era (CE) which is mapped to the '0001-01-01' date
146148
// in Proleptic Gregorian calendar.
147149
private final val gregorianCommonEraStartDay = gregJulianDiffSwitchDay(0)
148150

151+
private final val gregorianStartDay = LocalDate.of(1582, 10, 15)
152+
private final val julianEndDay = LocalDate.of(1582, 10, 4)
153+
149154
/**
150155
* Converts the given number of days since the epoch day 1970-01-01 to a local date in Proleptic
151156
* Gregorian calendar, interprets the result as a local date in Julian calendar, and takes the
@@ -165,7 +170,10 @@ object RebaseDateTime {
165170
* @return The rebased number of days in Julian calendar.
166171
*/
167172
private[sql] def localRebaseGregorianToJulianDays(days: Int): Int = {
168-
val localDate = LocalDate.ofEpochDay(days)
173+
var localDate = LocalDate.ofEpochDay(days)
174+
if (localDate.isAfter(julianEndDay) && localDate.isBefore(gregorianStartDay)) {
175+
localDate = gregorianStartDay
176+
}
169177
val utcCal = new Calendar.Builder()
170178
// `gregory` is a hybrid calendar that supports both
171179
// the Julian and Gregorian calendar systems

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala

+22
Original file line numberDiff line numberDiff line change
@@ -364,4 +364,26 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper {
364364
}
365365
}
366366
}
367+
368+
test("rebase not-existed dates in the hybrid calendar") {
369+
outstandingZoneIds.foreach { zid =>
370+
withDefaultTimeZone(zid) {
371+
Seq(
372+
"1582-10-04" -> "1582-10-04",
373+
"1582-10-05" -> "1582-10-15", "1582-10-06" -> "1582-10-15", "1582-10-07" -> "1582-10-15",
374+
"1582-10-08" -> "1582-10-15", "1582-10-09" -> "1582-10-15", "1582-10-11" -> "1582-10-15",
375+
"1582-10-12" -> "1582-10-15", "1582-10-13" -> "1582-10-15", "1582-10-14" -> "1582-10-15",
376+
"1582-10-15" -> "1582-10-15").foreach { case (hybridDate, gregDate) =>
377+
withClue(s"tz = ${zid.getId} hybrid date = $hybridDate greg date = $gregDate") {
378+
val date = Date.valueOf(gregDate)
379+
val hybridDays = fromJavaDateLegacy(date)
380+
val gregorianDays = localDateToDays(LocalDate.parse(hybridDate))
381+
382+
assert(localRebaseGregorianToJulianDays(gregorianDays) === hybridDays)
383+
assert(rebaseGregorianToJulianDays(gregorianDays) === hybridDays)
384+
}
385+
}
386+
}
387+
}
388+
}
367389
}

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala

+5-3
Original file line numberDiff line numberDiff line change
@@ -493,17 +493,19 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
493493
}
494494
}
495495

496-
test("SPARK-31238: rebasing dates in write") {
496+
test("SPARK-31238, SPARK-31423: rebasing dates in write") {
497497
withTempPath { dir =>
498498
val path = dir.getAbsolutePath
499-
Seq("1001-01-01").toDF("dateS")
499+
Seq("1001-01-01", "1582-10-10").toDF("dateS")
500500
.select($"dateS".cast("date").as("date"))
501501
.write
502502
.orc(path)
503503

504504
Seq(false, true).foreach { vectorized =>
505505
withSQLConf(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> vectorized.toString) {
506-
checkAnswer(spark.read.orc(path), Row(Date.valueOf("1001-01-01")))
506+
checkAnswer(
507+
spark.read.orc(path),
508+
Seq(Row(Date.valueOf("1001-01-01")), Row(Date.valueOf("1582-10-15"))))
507509
}
508510
}
509511
}

0 commit comments

Comments
 (0)