Add elbow_chart to help determine ideal class size #32

raashidsalih · 2024-06-23T09:53:52Z

Hey, @mthh.
Loved the experience using JenksPy, especially the model-like interface by @yasirroni. My use case involved trying to figure out the ideal class size (similar to this) and I had some code on hand, so thought it might be a good idea to integrate it into the package. I'm not sure if there is a methodological precedence to using elbow charts based on goodness of variance fit to do so, but the added convenience should be nice for people who need it.

At a high level, the elbow_chart function takes in the data array, a lower bound (default to 2) and an upper bound. It then gets the GVF values for each class size between the bounds (inclusive of both) and returns an elbow chart along with the results.

How Function Operates

Checks and confirms both bounds are of int datatype, and also checks if upper bound is greater than lower bound. Didn't validate any further to prevent redundancy since the function later triggers validate_input for each n_class value.
Pre-allocate lists to store results instead of appending via for loop. Not strictly necessary for performance, but could help for large n_class values.
Loop over each n_class value and obtain results.
Convert disparate results into dictionary to return later.
Generate plot based on results.
Return Tuple containing plot and results dictionary.

Changes Made

Import Tuple from Typing for function return typehint in core.py.
Import matplotlib for plot generation in core.py.
Add function elbow_chart at end of core.py.
Import elbow_chart in __init__.py.
Add elbow_chart in __all__ in __init.py__.

Things to Discuss

I'm not thrilled about the added matplotlib dependency. We have a few options:
- We could throw an error instructing the user to install matplotlib if they want to use the function (as seen here)
- Add an optional tag when installing that either contains [core] or [all]
- I breifly considered using something like tkinter (which seems to be built-in) and even ASCII art, but clearly, they're not that feasible.
- Honestly, I reckon most people who want to use this package will be the type that has matplotlib installed already, but I am curious as to what you think. Any other alternatives we can employ here?
Do we need a default value for upper_bound? I don't have a strong opinion on this one, could go either way.

Apologies for the long message, but wanted to ensure clarity. I also have ideas about integrating parallel processing in the for loop for additional performance gains, and also a sampling strategy (as mentioned here), but I think that can wait for another PR.

Example Usage

Added function and changes made to import

yasirroni · 2024-06-26T11:17:27Z

It has been a long time since I make contribution here. But I will give two opinions based on my experience.

You should not optimize number of group, except you are optimizing in your sample data and later use the number found for the larger original data. As you already know, this is due to the upper (or lower) bound, especially for normalization.
I've write small implementation of elbow long time ago that you want to read (?): https://github.com/yasirroni/kneed_elbow

raashidsalih added 2 commits June 23, 2024 13:18

Initial functionality commit

d8805eb

Added function and changes made to import

Update __init__.py with added function

4887b61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add elbow_chart to help determine ideal class size #32

Add elbow_chart to help determine ideal class size #32

raashidsalih commented Jun 23, 2024 •

edited

Loading

yasirroni commented Jun 26, 2024

Add elbow_chart to help determine ideal class size #32

Are you sure you want to change the base?

Add elbow_chart to help determine ideal class size #32

Conversation

raashidsalih commented Jun 23, 2024 • edited Loading

yasirroni commented Jun 26, 2024

raashidsalih commented Jun 23, 2024 •

edited

Loading