Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loadings cutoff option (PCA) #212

Open
cdiazmun opened this issue Feb 22, 2021 · 7 comments
Open

Loadings cutoff option (PCA) #212

cdiazmun opened this issue Feb 22, 2021 · 7 comments

Comments

@cdiazmun
Copy link

Hello!

First, thank you for developing the package, it has been very useful.

I actually open an issue to request (if possible) a new feature at plotting the factor loadings in a PCA. There are already nice aesthetic options for the loadings. However, I would be interested on setting a loadings.cutoff option to select the desired ones. When working with PCAs based on many variables (50 in my case) it can become very messy even when playing with sizes and all. Furthermore, there are some factors that I may not be interested on, because they don't explain any variance in the samples, so it's also a nice feature to filter-out some factors.

Thank you in advance.

Regards,
Cristian

@terrytangyuan
Copy link
Collaborator

Could you give an example use of the loadings.cutoff option that you are proposing? Would you like to submit a pull request? The related code is currently in this file: https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R

@cdiazmun
Copy link
Author

Following the example you use to illustrate your package:

autoplot(prcomp(df), data = iris, colour = "Species",
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)

If you do: print(prcomp(df)) you get a list with the loadings for the PCA list:

Standard deviations (1, .., p=4):
[1] 2.0562689 0.4926162 0.2796596 0.1543862

Rotation (n x k) = (4 x 4):
PC1 PC2 PC3 PC4
Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574

Then with a cutoff option, you could select those above a threshold [absolute 0.7 for instance (to take loadings above 0.7 or below -0.7)] in PC1 and PC2, which are the ones you want to plot:

autoplot(prcomp(df), data = iris, colour = "Species",
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3,
loadings.cutoff = 0.7)

Then in the final plot you would only see Sepal.Width and Petal.Length.

@terrytangyuan
Copy link
Collaborator

Thank you! This looks very useful indeed. Would you like to submit changes to support this feature?

@cdiazmun
Copy link
Author

cdiazmun commented Feb 24, 2021

I'm very sorry, but I'm not very familiar with GitHub, so I actually don't know how to do that. And neither how to submit a pull request, although I have the feeling is the same thing haha. I will read the guide I try it soon.

@terrytangyuan
Copy link
Collaborator

Okay great. I won't have time to get to this soon so feel free to give it a try!

@BioinfGuru
Copy link

BioinfGuru commented May 13, 2024

Hi @terrytangyuan, has anyone made progress in implementing this yet (or a workaround) ? I'd certainly be interested as a side project.

@terrytangyuan
Copy link
Collaborator

Nope. Go ahead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants