-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update timestamps.pyx #60624
Open
johnasiano
wants to merge
1
commit into
pandas-dev:main
Choose a base branch
from
johnasiano:fix-timestamp-normalize-overflow
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+22
−9
Open
Update timestamps.pyx #60624
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_val is an int64, so don't these conditions always evaluate to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right.
Since pd.Timestamp.min is set to the smallest possible value in two's complement, then when we try to subtract even a small positive number, it can't get even "more negative". So would it be fair to say that what was described in the initial issue isn't necessarily a bug of pandas but rather just a constraint of the two's complement arithmetic that pandas uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of adding a check that always evaluates to
False
regardless of the value oflocal_val
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you declare
local_val
as int64_t here you can just add the Cython@cython.overflowcheck(True)
decorator to this function. That should greatly simplify what you are trying to do here while being much more performantThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, @rhshadrach, I meant to say that you were correct in pointing that out, and I agree that this line
if local_val < INT64_MIN or local_val > INT64_MAX:
is not a good way to check for the overflow.@WillAyd, are you proposing a change to the normalize function? Or to int64_t normalize_i8_stamp? Because in the return for int64_t normalize_i8_stamp, I think the subtraction of any positive value from local_val, when local_val is the Timestamp.min, is what is causing the wrap around.
Also, what should be the expected behavior if overflow occurs during normalization? For example, should the code raise an exception, return the original timestamp, return NaT, or do something else entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wherever this is happening you can use the decorator
It should raise an error. Signed overflow is undefined behavior - we can't do anything about it but raise in advance of that happening