Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Change reported Eigenvalues in Factor Characteristics Table (EFA) #2418

Open
FloSchuberth opened this issue Nov 17, 2023 · 17 comments · May be fixed by jasp-stats/jaspFactor#256

Comments

@FloSchuberth
Copy link

Description

Currently the Eigenvalues of the original correlation matrix are reported, i.e., these are the Eigenvalues associated with principal components and not common factors. Therefore, I suggest to report the Eigenvalues of the correlation matrix where the main diagonal has been replaced by communality estimates. These are usually the Eigenvalues associated with common factors extracted by EFA. If you conduct a parallel analysis based on FA you already do this. However, under factor characteristics, you always report the Eigenvalues of the original correlation matrix, i.e., those associated with principal components. I think this is confusing.

Purpose

Increase consistency between the outputs.

Use-case

No response

Is your feature request related to a problem?

No

Is your feature request related to a JASP module?

Factor

Describe the solution you would like

I would like that you report the Real data factor eigenvalues (those obtained from parallel analysis based on FA) in the Factor Characteristics table and not the Eigenvalues of the original correlation matrix.

Describe alternatives that you have considered

No response

Additional context

As you can see in the screenshot, the Eigenvalues differ between the parallel analysis and the Factor Characteristics Table.
Eigenvalue1

As you can see, in the following screenshot, these are the Eigenvalues associated with the principal components.

Eigenvalue2

Since an EFA is conducted, I would report the eigenvalue associated with the factors not components.

@juliuspfadt
Copy link
Contributor

@FloSchuberth thanks for bringing this up. I rarely use EFA, so I do not know about the conventions. You are of course correct with the values being reported. @Kucharssim what do you think?

@FloSchuberth
Copy link
Author

@juliuspfadt : You are welcome. I am also not an expert in EFA, so I am also not sure about the conventions. It just felt inconsistent to me, particularly, if you start comparing the Eigenvalues between the parallel analysis and Factor characteristics tables...

@Kucharssim
Copy link
Member

@Kucharssim what do you think?

I don't know the conventions, but the request by @FloSchuberth sounds very reasonable to me.

On the other hand, I think I remember that you @juliuspfadt implemented the eigenvalues based on a request form a user. Did they request these eigenvalues specifically? Or am I misremembering this?

@juliuspfadt
Copy link
Contributor

Well I remembered something similar, but I can't seem to find the closed issue. I think we had a discussion about this before.

@juliuspfadt
Copy link
Contributor

found it: jasp-stats/jaspFactor#79

@FloSchuberth
Copy link
Author

@juliuspfadt: Skipping through your conversation in the other thread, I think you addressed there a different issue, namely how to label the columns of the Characteristics table. In PCA, the ratio between the largest Eigenvalue of the correlation matrix and the sum of all these Eigenvalues tells you how much of the variance can be explained by the first PC (similar can be done for second largest EV and the second PC). However, this is not necessarily case once start rotating the PCA solution. I think this was the issue that you discussed the other thread.

My point is more about which values to report for EFA. Currently, you report the Eigenvalues of the correlation matrix. However, the correlation matrix is the input for PCA, not for EFA. Therefore, I think it would be more consistent to report for EFA, the Eigenvalues of the actual matrix that was used as input (I guess the correlation matrix, where the main diagonal elements have been replaced by some communalitiy estimates) instead of the EV of the original correlation matrix.

@FloSchuberth
Copy link
Author

A problem might be that currently, if you use the Eigenvalue criterion to decide about the number of factors to be extracted, the Eigenvalues of the original correlation matrix are considered. So if you replace the Eigenvalues in the Characteristics Table, one cannot trace back why a certain number of factors was extracted if the Eigenvalue criterion was chosen. However, this leads to a follow-up question: Is it a good idea to base the decision about the number of factors on the Eigenvalues of the original correlation matrix or shall it be rather based on the Eigenvalues of the correlation matrix where the diagonal was replaced by communality estimates? Unfortunately, I have no direct answer to that. However, this obviously creates some inconsistency. In the following a screenshot of an EFA.
Eigenvalue3
As you can see in the characteristics Table, the Eigenvalues of the correlation matrix are reported. In the parallel analysis, other eigenvalues are reported (I assume the ones of the matrix that was actually used as input for EFA). In the Scree plot also the 'correct' Eigenvalues are plotted. However, based on the Eigenvalue >1criterion, 4 factors are extracted.

@juliuspfadt
Copy link
Contributor

You raise a very good point. So what this comes down to, would be if people are interested in the eigenvalues of the "raw" correlation matrix, or the eigenvalues of the "transformed" matrix when using the ev>1 criterion, in an EFA. I think for the output we could provide both, the "raw" data eigenvalues and the "transformed" data eigenvalues. Or just the ones that align with the eigenvalue criterion...

@FloSchuberth
Copy link
Author

FloSchuberth commented Nov 17, 2023

I would rather say JASP should be consistent with the Eigenvalues that are reported, i.e., report the same set of Eigenvalues. As you can see, the Eigenvalue are reported at various places, e.g., in the Scree plot and Factor characteristics table.

The Eigenvalue criterion, which as far as I know, has its origin in PCA. The idea is that we only extract components that explain at least the variance of a single variable. PCA is used to condense the original variables in a smaller that of components without loosing a lot of information. So it makes sense that a component should explain more than the variance of a single variable. Since variables are typically standardized before PCA, that translates to extracting as many components as Eigenvalues of the correlation matrix that are larger than 1.

This criterion is also used in the EFA context. However, I am not sure which Eigenvalues should be actually considered in the EFA context. Considering the eigenvalues of the correlation matrix feels strange, as they have no direct link to factors. However, it might be that this is still recommended because PCA and EFA often lead to similar results. According to the motto: "So even though we know we make a mistake, we might be pretty close". On the other hand, one could consider the correct Eigenvalues. However, obviously they don't tell us how much variance can be explained by the extracted factor. For instance in the parallel analysis table there are negative Eigenvalues, how should they be interpreted in terms of variance explained. Then the question is: What is the intuition/idea of using the correct Eigenvalues to decide about the number of factors. So why should we extract as many factor as we have Eigenvalues of the transformed correlation matrix larger than 1. I have already thought about this in the past. Unfortunately, it seems that most people do not really care about it. So I am also a bit puzzled.

What to do? I guess I would only report the Eigenvalues of the transformed correlation matrix and also base the eigenvalue criterion on it. In this way the output is consistent, which I believe is more important. Just my 2 cents. Perhaps you know a EFA/PCA expert with whom you can discuss this issue. Of course, I would be happy if you could keep me posted.

PS: A good starting point might be the paper of Kaiser (1960). There he writes (p. 145):
"More specifically, Guttman has found criteria for determining
a lower bound for the number of factors. His universally strongest
lower bound requires that we find the number of positive latent roots
of the observed correlation matrix with squared multiples in the
diagonal. An alternative lower bound-weaker than the one just
mentioned-requires that we find the number of latent roots greater
than one of the observed correlation matrix
. I have systematically
studied the first of these lower bounds through the use of a computer
and have gotten results which surprised even Professor Guttman:
it almost invariably is necessary-in the strict algebraic sense-to
have more than half as many factors as there are variables in the
study. This is not a very delightful result-considering the well-
known results regarding unique communalities. I have also studied
somewhat systematically Guttman’s other lower bound for the num-
ber of factors, the number of eigenvalues greater than one of the
observed correlation matrix. This typically runs from a sixth, say,
to a third, of the number of variables.
The reason for studying this second lower bound with some care
has to do with the next criterion for the number of factors, psycho-
metric reliability. Very recently, I have worked out all of the formu-
las for the Kuder-Richardson reliability of factors. One remarkably
simple result is that for a principal component to have positive
Kuder-Richardson reliability, it is necessary and sufficient that the
associated eigenvalue be greater than one-a finding corresponding
exactly to Guttman’s algebraic lower bound.
And, finally, from the fourth, and by far most important view-
point for choosing the number of factors-psychological meaning-
fulness-I have found that the number of eigenvalues greater than
one of the observed correlation matrix led to a number of factors
corresponding almost invariably, in a great number of studies, to
the number of factors which practicing psychologists were able to
interpret."

So I think both can be used, but following Kaiser, the Eigenvalues of the correlation matrix seem to provide a better number of factors in practice. Of course, this study was published 60 years ago, so I guess there is more current knowledge on that..

Reference:
Kaiser, H. F. (1960). The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement, 20(1), 141–151. https://doi.org/10.1177/001316446002000116

@maxauerswald
Copy link

Julius just send me this link so I'll give my 2ct: I don't think it's necessarily inconsistent to use PCA-eigenvalues for EFA, because a fully specified common factor model will completely determine both the PCA- and common factor eigenvalues on the population level. The problem with the common factor eigenvalues is that they are based on the reliability estimate of a factor model that is unknown in the context of EFA. Therefore, it's a question for simulation studies which eigenvalues yield more reliable estimates regarding the true number of factors. We found that PCA-eigenvalues perform better in a rather extensive simulation study. Caveat: we only considered common factor eigenvalues based on ML.

Auerswald, M. , & Moshagen, M. (2019). How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions. Psychological Methods, 24, 468-491. doi: 10.1037/met0000200

@FloSchuberth
Copy link
Author

@maxauerswald Thanks for the insights! So this seem to be in line with Kaiser (1960) "I have found that the number of eigenvalues greater than one of the observed correlation matrix led to a number of factors corresponding almost invariably, in a great number of studies, to the number of factors which practicing psychologists were able to interpret."

However, the issue remains that if you conduct a parallel analysis based on FA, the eigenvalues of the transformed correlation matrix are used and reported, while in the factor characteristics table the eigenvalues of the original correlation matrix are always reported. So for users that might be not so experienced with EFA/PCA this might be confusing...Rhetoric question: Why are there two different types of eigenvalues for one analysis?

Hence my suggestion to change the reported eigenvalues in the factor characteristics table and report the ones of the transformed correlation matrix. In case one wants to determine the number of factors based on the eigenvalues of the original correlation matrix, one can still do that by using parallel analysis based on PC. In this case, the eigenvalues of the original correlation matrix are reported in the parallel analysis results table.

@juliuspfadt
Copy link
Contributor

juliuspfadt commented Nov 20, 2023

Hmm, we have multiple points where the eigenvalues are chosen and reported and we may make the following changes:

  • Determine the number of factors:
    • Parallel analysis (leave as is):
      • Based on PC
      • Based on FA
    • Eigenvalues above (change):
      • FA-based EVs
      • PC-based EVs
  • Factor characteristics table (change):
    • report the factor eigenvalues (aka based on correlation matrix with communalities)
  • Parallel analysis table (leave as is): PC or FA-based
  • Scree plot:
    • we show eigenvalues and the parallel analysis results based on the what is chosen in the factor number determination section (PC or FA based)

we might do the same for the PCA in JASP. Although I wonder if anyone using PCA is interested in factor-based estimates.

@FloSchuberth
Copy link
Author

@juliuspfadt Sound reasonable. Considering PCA, I am also not sure, I tend to say no.

@juliuspfadt
Copy link
Contributor

juliuspfadt commented Nov 20, 2023

Thanks @FloSchuberth for the very valuable discussion and all your input. And thanks @maxauerswald for your 2c 🙂

@FloSchuberth
Copy link
Author

and off-topic: the link to your LinkedIn profile does not work:
https://jasp-stats.org/team/julius/

@juliuspfadt
Copy link
Contributor

well that sucks. thanks for letting me know :)

@tomtomme tomtomme changed the title Change the reported Eigenvalues in the Factor Characteristics Table (EFA) [Feature Request]: Change reported Eigenvalues in Factor Characteristics Table (EFA) Jan 6, 2025
@juliuspfadt juliuspfadt linked a pull request Jan 23, 2025 that will close this issue
@juliuspfadt
Copy link
Contributor

most of this will be fixed by the linked PR. Since I plan to separate the analysis for determining the number of components/factors, the fixes specific to that will be in there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants