Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix pointplot(native_scale=True, ...) #3811

Closed
wants to merge 1 commit into from
Closed

Conversation

v4hn
Copy link

@v4hn v4hn commented Jan 9, 2025

Just ran into this plotting something.

Sorry if this should be done cleaner.
I removed the backward compatible code because it failed with working native_scale.

Just ran into this plotting something.
@mwaskom
Copy link
Owner

mwaskom commented Jan 9, 2025

Please supply an example that reproduces the problem you ran into. This works fine for me:

import seaborn as sns
tips = sns.load_dataset("tips")
sns.pointplot(tips, x="size", y="total_bill", native_scale=True)

image

@v4hn
Copy link
Author

v4hn commented Jan 13, 2025

I hit this in the middle of designing a plot combining lineplot and barplot and eventually moved to manual x axis assignment for individual plots, so I cannot reproduce this in my original context anymore.

Note that your example cannot illustrate the effect native_scale has if I understood it correctly because the values are evenly spaced.
Playing around with your example a bit I can at least produce the issue below, which might reflect my previous situation. I'm not sure whether you consider this a user error or a bug on the framework side:

import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
tips['ssize'] = tips['size'].apply(lambda x: x**4).astype('category')
tips['ssize'] = tips['ssize'].cat.reorder_categories(sorted(tips['ssize'].cat.categories), ordered=True)
sns.pointplot(data=tips, x='ssize', y='total_bill', native_scale=True, legend=False)

pointplot

In the example I would have expected this result, which I can also get by explicitly converting the category to float:
pointplot

I believe my original example was slightly more convolved with grouped categories though if that can make a difference in interpretation.

I also believe my current patch is insufficient to address the issue as I see errors when I do not specify legend.

@mwaskom
Copy link
Owner

mwaskom commented Jan 13, 2025

I think your issue is perhaps just that the "native scale" for a series with a category dtype is a categorical representation, not a numeric one. I believe that is consistent and shouldn't change.

@v4hn
Copy link
Author

v4hn commented Jan 13, 2025

That does make sense. Thank you for the clarification!

As it does not seem to be a reasonable/useful function call, I would have appreciated a warning when native_scale=True is specified with a categorical variable. Something like

native_scale=True has no effect on categorical variables.
Did you intend a different dtype?

@mwaskom
Copy link
Owner

mwaskom commented Jan 13, 2025

I could see how that would be helpful but I'm generally opposed to issuing warnings where the user isn't doing something unambiguously "wrong", otherwise they can get very noisy and annoying.

@v4hn
Copy link
Author

v4hn commented Jan 13, 2025

I do not quite agree as someone who tripped over it, but the argument is valid.

Last suggestion before close then:

The documentation for the parameter (in at least pointplot and boxplot) states

When True, **numeric or datetime values on the categorical axis** will maintain
their original scaling rather than being converted to fixed indices.

I would suggest to rephrase this to indicate that the dtype of the categorical axis must not be category for this. Something like:

When True and dtype of the categorical axis is numeric or datetime,
the categorical axis will maintain the original scaling rather than convert to even spacing.

@mwaskom
Copy link
Owner

mwaskom commented Jan 13, 2025

If you're trying to emphasize "dtype" then I'm not sure that's accurate: seaborn will treat series that have an object dtype but numeric values as numeric, it's only the categorical dtype that is interpreted as an explicit request for categorical treatment.

@v4hn
Copy link
Author

v4hn commented Jan 13, 2025

Yes, I tried. Do you have a more accurate suggestion? :)

With this insight my personal (user) intuition is that dtype "category" should be treated the same as dtype "object" with regards to the native_scale parameter, but you already stated you disagree.

Another suggestion:

When True, numeric or datetime values on the categorical axis will maintain
their original scaling rather than being converted to fixed indices.
The native scale of a Series with the explicit category dtype always uses fixed indices.

@v4hn v4hn closed this Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants