Bug: Save original index and remap after function completes#61116
Bug: Save original index and remap after function completes#61116mroeschke merged 7 commits intopandas-dev:mainfrom
Conversation
rhshadrach
left a comment
There was a problem hiding this comment.
Thanks for the PR! I'm seeing a performance regression when the index does not contain duplicates. Can we do this conditionally.
I appreciate the review! Can you confirm the performance regression numbers that you are seeing? Note that the timings in the original ticket were from a different machine than the one I'm working on. ASV Benchmarks appear similar enough to be reasonable in the standard case. -- Original Function -- -- Function with my changes -- |
Hey, can I get a re-review on this? |
|
Thanks for the ping, will get to it in the next day or two. |
| - :meth:`Series.cummin` and :meth:`Series.cummax` now supports :class:`CategoricalDtype` (:issue:`52335`) | ||
| - :meth:`Series.plot` now correctly handle the ``ylabel`` parameter for pie charts, allowing for explicit control over the y-axis label (:issue:`58239`) | ||
| - :meth:`DataFrame.plot.scatter` argument ``c`` now accepts a column of strings, where rows with the same string are colored identically (:issue:`16827` and :issue:`16485`) | ||
| - :meth:`Series.nlargest` uses a 'stable' sort internally and will preserve original ordering. |
There was a problem hiding this comment.
| - :meth:`Series.nlargest` uses a 'stable' sort internally and will preserve original ordering. |
There was a problem hiding this comment.
@mroeschke - there is a change in Series.nlargest here where previously we weren't using a stable sorting algorithm in certain cases. Now the only dtype I think this can possibly effect is object (as otherwise equal implies identical), and even there it is quite uncommon to have an impact. Perhaps the line needs to make this more clear that it is unlikely to have any impact, or are you thinking it isn't even worth mentioning?
There was a problem hiding this comment.
Ah gotcha. OK doesnt hurt to include it in the whatsnew notes
7e9a2a3 to
8aa2a2c
Compare
|
Thanks @Jeffrharr |
…ev#61116) * Save original index and remap after function completes. * precommit passes * use stable sorting 'mergesort' in tests * Change sorts to `stable` instead of mergesort * modify 'keep' to use a Literal instead of string * address comments * update doc to include stable sort change
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.Note: I'm new to this project, so this is my first PR.
Saves the index for SeriesNLargest at algorithm start and resets it before returning. This fixes performance issues when the index has many duplicate values.
Results:
Tests
The existing tests should cover this unless we want to add specific tests via the
asv_bench.Addendum
I also modified the call to sort to use
sort(kind="stable")to get consistent ordering which is what is currently happening in the equivalent Frame method (it was usingkind=mergesortwhich is equivalent tokind=stable, but kept for portability). I can remove this -- it may be better in another PR.https://numpy.org/doc/stable/reference/generated/numpy.sort.html#numpy.sort