Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update timestamps.pyx #60624

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 22 additions & 9 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1237,7 +1237,7 @@ cdef class _Timestamp(ABCTimestamp):

# -----------------------------------------------------------------
# Transformation Methods

def normalize(self) -> "Timestamp":
"""
Normalize Timestamp to midnight, preserving tz information.
Expand All @@ -1264,14 +1264,27 @@ cdef class _Timestamp(ABCTimestamp):
Timestamp('2020-03-14 00:00:00')
"""
cdef:
local_val = self._maybe_convert_value_to_local()
int64_t normalized
int64_t ppd = periods_per_day(self._creso)
_Timestamp ts

normalized = normalize_i8_stamp(local_val, ppd)
ts = type(self)._from_value_and_reso(normalized, reso=self._creso, tz=None)
return ts.tz_localize(self.tzinfo)
local_val = self._maybe_convert_value_to_local()
int64_t normalized
int64_t ppd = periods_per_day(self._creso)
_Timestamp ts

# Check for potential overflow before normalization
if local_val < INT64_MIN or local_val > INT64_MAX:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_val is an int64, so don't these conditions always evaluate to false

Copy link
Author

@johnasiano johnasiano Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right.

Since pd.Timestamp.min is set to the smallest possible value in two's complement, then when we try to subtract even a small positive number, it can't get even "more negative". So would it be fair to say that what was described in the initial issue isn't necessarily a bug of pandas but rather just a constraint of the two's complement arithmetic that pandas uses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of adding a check that always evaluates to False regardless of the value of local_val?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you declare local_val as int64_t here you can just add the Cython @cython.overflowcheck(True) decorator to this function. That should greatly simplify what you are trying to do here while being much more performant

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, @rhshadrach, I meant to say that you were correct in pointing that out, and I agree that this line
if local_val < INT64_MIN or local_val > INT64_MAX: is not a good way to check for the overflow.

@WillAyd, are you proposing a change to the normalize function? Or to int64_t normalize_i8_stamp? Because in the return for int64_t normalize_i8_stamp, I think the subtraction of any positive value from local_val, when local_val is the Timestamp.min, is what is causing the wrap around.

Also, what should be the expected behavior if overflow occurs during normalization? For example, should the code raise an exception, return the original timestamp, return NaT, or do something else entirely?

Copy link
Member

@WillAyd WillAyd Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the subtraction of any positive value from local_val, when local_val is the Timestamp.min, is what is causing the wrap around.

Wherever this is happening you can use the decorator

Also, what should be the expected behavior if overflow occurs during normalization? For example, should the code raise an exception, return the original timestamp, return NaT, or do something else entirely?

It should raise an error. Signed overflow is undefined behavior - we can't do anything about it but raise in advance of that happening

raise OutOfBoundsDatetime(
f"Cannot normalize Timestamp {self} without overflow"
)

normalized = normalize_i8_stamp(local_val, ppd)

# Additional overflow check after normalization
if normalized < INT64_MIN or normalized > INT64_MAX:
raise OutOfBoundsDatetime(
f"Normalization of {self} would cause an overflow"
)

ts = type(self)._from_value_and_reso(normalized, reso=self._creso, tz=None)
return ts.tz_localize(self.tzinfo)

# -----------------------------------------------------------------
# Pickle Methods
Expand Down
Loading