Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Histogram Operator with Distribution Plot Visualization #2711

Merged
merged 7 commits into from
Jul 5, 2024

Conversation

JeshChoi
Copy link
Collaborator

@JeshChoi JeshChoi commented Jul 1, 2024

Purpose

Integrated Distribution Plot Operator into Histogram Operator, supporting visualization of rug, violin, and box distributions. Visualizing data through different distribution plots is crucial for comprehensive data analysis and interpretation:

  • Rug Plot: Provides a simple, straightforward way to visualize the distribution of individual data points along an axis. It helps see the density and spread of the data points without summarizing them into bins or aggregated statistics.
  • Violin Plot: This combination of a box plot and a density plot provides detailed insights into the distribution shape, probability density, and multi-modal data. It helps understand the data's distribution and variability beyond what a box plot can show.
  • Box Plot: Summarizes key statistics (median, quartiles, outliers) in a compact form, offering a clear comparison across different categories. It’s particularly effective for highlighting central tendencies and variations within the data.

Description

  • Integrated DistPlotOpDesc class into HistogramChartOpDesc class to support distribution plots.
  • Added properties for distribution type (rug, violin, box).
  • Integrated Plotly is used to render the distribution plots.
  • Ensured the output is embedded as HTML content.

Demo Pictures

Simple CSV to Operator Set Up

Screenshot 2024-07-03 at 21 17 53

New Parameter

Screenshot 2024-07-03 at 21 18 06

Violin Distribution Output

Screenshot 2024-07-03 at 21 19 27

Box Distribution Output

Screenshot 2024-07-03 at 21 19 01

Rug Distribution Output

Screenshot 2024-07-03 at 21 18 37

@aglinxinyuan aglinxinyuan changed the title Added operator for distribution plot visualization Add Distribution Plot Visualization Operator Jul 1, 2024
@aglinxinyuan aglinxinyuan requested review from kunwp1 and removed request for aglinxinyuan July 2, 2024 21:42
@kunwp1
Copy link
Collaborator

kunwp1 commented Jul 2, 2024

Hey @JeshChoi, I just wanted to let you know that your PR looks excellent! I want to suggest that the plot seems similar to the existing histogram plot. Both use the Plotly API but with different parameters. Instead of creating the distribution plot as a new visualization operator, it would be nice to integrate your work into the histogram plot. I'm curious to hear your thoughts on this. Thank you.

@JeshChoi
Copy link
Collaborator Author

JeshChoi commented Jul 3, 2024

Hi @kunwp1, thanks for looking over my PR.

Regarding the similarities between the DistPlot and Histogram operators, the decision to integrate them should consider our approach to migrating visualizers from Plotly to Texera.

Professor Chen Li previously mentioned that he envisions Texera incorporating all visualizers from Plotly, possibly even categorizing them similarly. Therefore, replicating all separate visualizers from Plotly to Texera could ensure comprehensive and consistent replication.

While this may lead to some redundancy in operators, it's important to note that non-technical users might search specifically for a distribution plot instead of a histogram, necessitating the inclusion of a DistPlot operator.

I'm open to either option and can proceed in either direction.

@kunwp1
Copy link
Collaborator

kunwp1 commented Jul 3, 2024

@JeshChoi I understand your point of view; however, I believe it would be beneficial to merge them.

The Plotly website categorizes plots into distribution and histogram plots, but I don't notice a visual distinction between them. From my perspective, the distribution plot is an extension of the histogram plot, allowing users to visualize multiple histograms and additional plots, such as the box plot. Separating them into distinct operators may make sense if you consider this a significant difference. However, I think that separating these two operators could potentially lead to confusion among users due to this similarity.

@JeshChoi
Copy link
Collaborator Author

JeshChoi commented Jul 4, 2024

@kunwp1 I have made the integration changes. Please take a look at the PR whenever you get a chance.

Copy link
Collaborator

@kunwp1 kunwp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please change the title of the PR to something else since the PR isn't adding a new visualization operator.

@JeshChoi JeshChoi changed the title Add Distribution Plot Visualization Operator Integrate Distribution Plot Visualization to Histogram Operator Jul 5, 2024
@JeshChoi JeshChoi changed the title Integrate Distribution Plot Visualization to Histogram Operator Enhance Histogram Operator with Distribution Plot Visualization Jul 5, 2024
@JeshChoi
Copy link
Collaborator Author

JeshChoi commented Jul 5, 2024

@kunwp1 I have changed the title of the PR

Copy link
Collaborator

@kunwp1 kunwp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You can merge it now

@JeshChoi JeshChoi merged commit dfae506 into master Jul 5, 2024
8 checks passed
@JeshChoi JeshChoi deleted the josh-add-distplot-operator branch July 5, 2024 19:33
@JeshChoi
Copy link
Collaborator Author

JeshChoi commented Jul 5, 2024

Awesome thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants