Skip to content

Commit 12a89e5

Browse files
dongjoon-hyundavies
authored andcommitted
[SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases
## What changes were proposed in this pull request? `PySpark` loses `microsecond` precision for some corner cases during converting `Timestamp` into `Long`. For example, for the following `datetime.max` value should be converted a value whose last 6 digits are '999999'. This PR improves the logic not to lose precision for all cases. **Corner case** ```python >>> datetime.datetime.max datetime.datetime(9999, 12, 31, 23, 59, 59, 999999) ``` **Before** ```python >>> from datetime import datetime >>> from pyspark.sql import Row >>> from pyspark.sql.types import StructType, StructField, TimestampType >>> schema = StructType([StructField("dt", TimestampType(), False)]) >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]] [(253402329600000000,)] ``` **After** ```python >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]] [(253402329599999999,)] ``` ## How was this patch tested? Pass the Jenkins test with a new test case. Author: Dongjoon Hyun <[email protected]> Closes apache#14631 from dongjoon-hyun/SPARK-17035.
1 parent 6f0988b commit 12a89e5

File tree

2 files changed

+6
-1
lines changed

2 files changed

+6
-1
lines changed

python/pyspark/sql/tests.py

+5
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,11 @@ def test_datetype_equal_zero(self):
178178
dt = DateType()
179179
self.assertEqual(dt.fromInternal(0), datetime.date(1970, 1, 1))
180180

181+
# regression test for SPARK-17035
182+
def test_timestamp_microsecond(self):
183+
tst = TimestampType()
184+
self.assertEqual(tst.toInternal(datetime.datetime.max) % 1000000, 999999)
185+
181186
def test_empty_row(self):
182187
row = Row()
183188
self.assertEqual(len(row), 0)

python/pyspark/sql/types.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ def toInternal(self, dt):
189189
if dt is not None:
190190
seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo
191191
else time.mktime(dt.timetuple()))
192-
return int(seconds * 1e6 + dt.microsecond)
192+
return int(seconds) * 1000000 + dt.microsecond
193193

194194
def fromInternal(self, ts):
195195
if ts is not None:

0 commit comments

Comments
 (0)