Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Within or between clusters sum of squares? #25

Open
ldeoto opened this issue Dec 12, 2023 · 4 comments
Open

Within or between clusters sum of squares? #25

ldeoto opened this issue Dec 12, 2023 · 4 comments

Comments

@ldeoto
Copy link

ldeoto commented Dec 12, 2023

Hello,

I am new to pygeoda. I am testing the SKATER algorithm and I find out that the summary stats do not coincide with those returned by the software. It appears to me that under "Total within-cluster sum of squares" the displayed value corresponds to the between-cluster sum of squares. The actual total within-cluster sum of squares does not show.
Would you confirm this?
Thanks for your help

@lixun910
Copy link
Member

I am not sure if I understand your question. Does the output of the skater function look like this?:

>>> skater_clusters = pygeoda.skater(4, queen_w, data)
>>> skater_clusters
{'Total sum of squares': 504.0000000000001,
'Within-cluster sum of squares': [57.890768263715266,
59.95241669262987,
28.725706194374844,
69.3802999471999,
62.30781060793979,
66.65808666485573],
'Total within-cluster sum of squares': 159.0849116292847,
'The ratio of between to total sum of squares': 0.3156446659311204,
'Clusters': (3, 2, 3, 1, 1, 1, 2, 1,...)
}

@ldeoto
Copy link
Author

ldeoto commented Dec 13, 2023

Hi @lixun910
Yes, it looks like that. So if you see: 'Total within-cluster sum of squares' should be the sum of all the 'Within-cluster sum of squares', that would be 344.91...
The value 159.08... corresponds to the between-cluster sum of squares (i.e. Total sum of squares minus Total within-cluster sum of squares). That's why 'The ratio of between to total sum of squares' is 0.31... (i.e 159.08 / 504)
So what is now called 'Total within-cluster sum of squares' is the Total between-cluster sum of squares. This should be adjusted.

@lixun910
Copy link
Member

I see. Yes, you are right: 159.08 should be the Total between-cluster sum of squares. I will fix it. Thanks, @ldeoto !

@ldeoto
Copy link
Author

ldeoto commented Dec 15, 2023

Great! Thanks @lixun910 ! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants