Skip to content

BUGFIX: Relplot Error adding refline when duplicate indicies present#3692

Closed
zacharygibbs wants to merge 2 commits intomwaskom:masterfrom
zacharygibbs:zach/bugfix
Closed

BUGFIX: Relplot Error adding refline when duplicate indicies present#3692
zacharygibbs wants to merge 2 commits intomwaskom:masterfrom
zacharygibbs:zach/bugfix

Conversation

@zacharygibbs
Copy link
Copy Markdown

This is related to issue #3690

In this case, replot was creating duplicated data when it didn't need to when the input data had duplicate indicies, which caused the refline addition to fail.

I was able to solve this by modifying the grid_data merge step at the end of the relplot function to only merge when there's actually something to merge!

Print Debugging before fix (self.data - dataframe shape):

relplot (15000, 4)
relplot_before_grid_data (15000, 4)
relplot_after_grid_data (45000, 9)
main (45000, 9)
---------------------------------------------------------------------------
ValueError: operands could not be broadcast together with shapes (45000,) (15000,) 

Print Debugging After (self.data - dataframe shape)

relplot (15000, 4)
relplot_before_grid_data (15000, 4)
relplot_after_grid_data (15000, 4)
main (15000, 4)

reproducible example

(From original issue, for reference - reproducible example)

import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

n_items = 5000
n_floats = 5
n_categorical = 3

df1 = pd.DataFrame(
    np.random.random((n_items, n_floats)),
    columns=[f'float{i}' for i in range(n_floats)]
)
df1 = df1.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})

df2 = pd.DataFrame(
    np.random.random((n_items, n_floats)),
    columns=[f'float{i}' for i in range(n_floats)]
)
df2 = df2.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})

df3 = pd.DataFrame(
    np.random.random((n_items, n_floats)),
    columns=[f'float{i}' for i in range(n_floats)]
)
df3 = df3.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})

df = pd.concat([df1.assign(origin=1), df2.assign(origin=2), df3.assign(origin=3)])
print(df)

fg=sns.relplot(data=df, x='float1', y='float2', hue='origin', row='categorical1')
print('main', fg.data.shape)
fg.refline(y=0.5)



plt.show()

@zacharygibbs zacharygibbs changed the title Zach/bugfix BUGFIX: Relplot Error adding refline when duplicate indicies present May 10, 2024
@zacharygibbs
Copy link
Copy Markdown
Author

Any word on this pull request? Seems like a simple fix, what's the hold up?

@mwaskom mwaskom closed this Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants