Badji Mokhtar - Annaba University
Question
Asked 26th Jul, 2014
How can I decide between using principal components analysis versus factor analysis?
These two methods may appear similar to the user, but aren't they quite different, and what would you tell a person who is considering using such methods? Thank you for your expert advises.
Most recent answer
Thank you all for your relevant answers.
Popular answers (1)
Gateways Hospital
Factor analysis (FA) is a group of statistical methods used to understand and simplify patterns of relationships underlying measured variables (Beavers, Lounsbury, Richards, Huck, Skolits, & Esquivel, 2013; Fabrigar, Wegener, MacCallum, & Strahan, 1999; Schmitt, 2011). Factor analysis is a concept that includes both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) (Jennrich & Bentler, 2011).
CFA tests whether a known factor model can predict a set of observed data (DeCoster, 1998). Researchers use CFA to verify or confirm hypotheses or theory (Ruscio & Roche, 2012; Schmitt, 2011), establish the validity of the factor model, compare two models using the same data, test the significance of factor loading, test relationships between factor loadings, test for correlation or lack of correlation of factors, and assess convergent and discriminate validity of measures (DeCoster, 1998).
EFA tests the number of common factors that influence measures and tests the strength and relationship between each common factor to the corresponding measure (DeCoster, 1998). Researchers use EFA to identify the nature of constructs that underlie responses given in a questionnaire, determine sets of items that interconnect, demonstrate the depth and breadth of measurement scales, classify the most important features of a group of items, and generate factor scores that represent the underlying constructs (DeCoster, 1998). Because EFA is a multivariate statistical approach, it is appropriate for reducing the number of factors, examining relationships between categories, and evaluating the construct validity of a measurement scale (Williams et al., 2010).
Exploratory factor analysis involves a series of statistical analysis steps. The first is the planning phase, where it is determined if the data is suitable for EFA by selecting the sample size then after collecting the data, creating a correlation matrix and testing for adequacy. The second step is to extract factors. The third step is to determine the number of factors to retain. The fourth step is factor rotation. The fifth step is to interpret the factor structure.
Principal component analysis (PCA) is a method of factor extraction (the second step mentioned above). Researchers use PCA when they want to reduce the number of variables while retaining as much of the original variance as possible (Conway & Huffcutt, 2003).
REFERNCES
Beavers, A. S., Lounsbury, J. W., Richards, J. K., Huck, S. W., Skolits, G J., & Esquivel, S. L. (2013). Practical considerations for using exploratory factor analysis in educational research. Practical Assessment, Research & Evaluation, 18(6), 1-13. Retrieved from http://www.pareonline.net/pdf/v18n6.pdf
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168. doi:10.1177/1094428103251541
DeCoster, J. (1998). Overview of Factor Analysis. Retrieved from http://www.stat-help.com/factor.pdf
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C. & Strahan, E J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299. doi:1082-989X/99/S3.00
Jennrich, R. I., & Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika, 76, 537-549. foi:10.1007/s11336-011-9218-4
Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in exploratory factor analysis using comparison data of known factorial structure. Psychologocial Assessment, 24(2), 282-292. doi:10.1037/a0025697
Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304-321. doi:10.1177/0734282911406653
26 Recommendations
All Answers (29)
Universitat Ramon Llull
In factor analysis normally you already have a model where the objective is to predict observed variables from theoretical latent factors whereas in principal component analysis the objective is to extract linear composites of observed variables.
University of West Florida
Thank you Rita, Thank you Farhat.
If you think that "there may be some underlying theoretical relationship", but you are unsure of it, would you still choose Factor Analysis of PCA?
Say, you suspect that certain cancer rates are somehow associated with air pollution. Could you use a FA model where you "throw in" all variables, with the goal to see if the cancer variables somehow appear in certain factors with air pollution?
University of Burgundy
Dear Raid,
I would say that FA is more for the determination underlying variables which explains why two other variables are correlated. While PCA is more on the distribution of individuals explained by principal component (i.e. by correlation between factors).
I would say that the choice depends on what you are the most interested factors or individual.
Finally I found that the PCA of the package FactoMineR (in R) is the best compromise for multivariate analysis:
Best regards,
Cyril
1 Recommendation
Université de Lille
You didn't wrote what kind of data you have.
For example, my data are mainly spectroscopic, thus always check physical meaning of extracted components (factor). (option for SPSS: check, scores, save as variables).
In other cases, look up the percentage of explained variance higher is (sometmes) better. For example when apply high kappa for promax in case of fluoresnce emission spectral components become "over-fitted" and gaining hiht percentage.
Also check if something is changing in qualitative meaning when changing the methods. Up to date, only once or twice I got different grouping of variables (some HPCL data) applying PCA and FA (with all possible options).
1 Recommendation
Universidad Central "Marta Abreu" de las Villas
Principal components analysis is only a data reduction method. It was common many decades ago when computers were slow. I know it is the default method in many statistical applications but factor analysis seems to be superior.
You can take a look to the following article where more information about this technique is provided:
Costello, A. B., & Osborne, J. W. (2005). Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis. Practical Assessment, Research & Evaluation, 10(7). Retrieved from http://pareonline.net/getvn.asp?v=10&n=7
If you need further guidance don't hesitate to contact me
University of West Florida
My main interest in factor analysis is to study relationships between several types of diseases in the population, and how such variables are related to other variables from different fields. Then I aim to output factor scores and use those in a cluster analysis. Can this also be done with PCA?
University of Burgundy
I would say that PCA and FA are not a (good) tools to find correlation between different variables.
This analysis try to explain several variables in one factor (or component).
For sure, two variables explaining the same factor (or component) should be correlated, but I don't think that is the aim of this kind of analysis.
Maybe a simple correlation matrix would help you better than those analysis ? (please find the link below to compute a correlation matrix on R)
Regards
1 Recommendation
Gateways Hospital
Factor analysis (FA) is a group of statistical methods used to understand and simplify patterns of relationships underlying measured variables (Beavers, Lounsbury, Richards, Huck, Skolits, & Esquivel, 2013; Fabrigar, Wegener, MacCallum, & Strahan, 1999; Schmitt, 2011). Factor analysis is a concept that includes both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) (Jennrich & Bentler, 2011).
CFA tests whether a known factor model can predict a set of observed data (DeCoster, 1998). Researchers use CFA to verify or confirm hypotheses or theory (Ruscio & Roche, 2012; Schmitt, 2011), establish the validity of the factor model, compare two models using the same data, test the significance of factor loading, test relationships between factor loadings, test for correlation or lack of correlation of factors, and assess convergent and discriminate validity of measures (DeCoster, 1998).
EFA tests the number of common factors that influence measures and tests the strength and relationship between each common factor to the corresponding measure (DeCoster, 1998). Researchers use EFA to identify the nature of constructs that underlie responses given in a questionnaire, determine sets of items that interconnect, demonstrate the depth and breadth of measurement scales, classify the most important features of a group of items, and generate factor scores that represent the underlying constructs (DeCoster, 1998). Because EFA is a multivariate statistical approach, it is appropriate for reducing the number of factors, examining relationships between categories, and evaluating the construct validity of a measurement scale (Williams et al., 2010).
Exploratory factor analysis involves a series of statistical analysis steps. The first is the planning phase, where it is determined if the data is suitable for EFA by selecting the sample size then after collecting the data, creating a correlation matrix and testing for adequacy. The second step is to extract factors. The third step is to determine the number of factors to retain. The fourth step is factor rotation. The fifth step is to interpret the factor structure.
Principal component analysis (PCA) is a method of factor extraction (the second step mentioned above). Researchers use PCA when they want to reduce the number of variables while retaining as much of the original variance as possible (Conway & Huffcutt, 2003).
REFERNCES
Beavers, A. S., Lounsbury, J. W., Richards, J. K., Huck, S. W., Skolits, G J., & Esquivel, S. L. (2013). Practical considerations for using exploratory factor analysis in educational research. Practical Assessment, Research & Evaluation, 18(6), 1-13. Retrieved from http://www.pareonline.net/pdf/v18n6.pdf
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168. doi:10.1177/1094428103251541
DeCoster, J. (1998). Overview of Factor Analysis. Retrieved from http://www.stat-help.com/factor.pdf
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C. & Strahan, E J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299. doi:1082-989X/99/S3.00
Jennrich, R. I., & Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika, 76, 537-549. foi:10.1007/s11336-011-9218-4
Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in exploratory factor analysis using comparison data of known factorial structure. Psychologocial Assessment, 24(2), 282-292. doi:10.1037/a0025697
Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304-321. doi:10.1177/0734282911406653
26 Recommendations
Universidad Veracruzana
They are actually different tehcniques based on different assumptions and used for different objectives. PCA is only a geometric or statistical trasnformation of data in order to get new synthetic variables, while FA suppose a model with some assumptions about the data generation. I can provide you the link to a publication where we compare both techniques in the financial context. I hope this helps
5 Recommendations
- The decision of whether to use EFA or PCA can only be made when the goals of a study are clearly known and specified.
- If the goal of a study is to obtain linear composites of observed variables that retain as much variance as possible, then PCA is the correct procedure.
- On the other hand, if the goal is to determine interpretable constructs that maximally explain Covariances among a set of observed variables, then EFA is the correct procedure.
- Source Byrne, B. M. (2005, P.28). Factor analytic models: Viewing the structure of an assessment instrument from three perspectives. Journal of personality assessment, 85(1), 17-32.
1 Recommendation
Anglia Ruskin University
Thank you for asking this question in 2014 as I am having the same one today in 2017 ;-).
1 Recommendation
University of West Florida
I have been encountering some interesting challenges when using factor analysis. If as a first step, we obtain a factor analysis, and then we output factor scores from the first few factors, then how do we use the factor scores in a cluster analysis in the next step?
Specifically: Let's say that we want to use the factor scores from Factor 1. The loadings are large for some variables and small for other variables; some are positive and some are negative. What will we "see" contained in the factor scores for Factor 1? If we will use the factor scores in a cluster analysis that can identify High clusters and also Low clusters, what exactly do such clusters mean, as realted to the original Factor 1 variables?
Has anyone here done such a cluster analysis? Please share with us your thoughts.
University of Florida
Neither are taught in entomology departments. Of course there aren't many options when your crowning achievement is 4 replicates.
There was a class at UC Davis in the late 80's in multivariate analysis that was required as part of getting a Minor in that subject. I know we went over PCA, but maybe not FA. I don't remember what textbook we used. The next encounter was about 5 years later when I had a large data set for my Ph.D.. I spent many happy hours stuffing my data through most of the procedures in the SAS-Stat user manual.
1 Recommendation
Similar questions and discussions
SEM in R 'lavaan WARNING: some estimated lv variances are negative'?
- Qirui Li
Dear all,
I am running an SEM in R. However, the model does not fit with reporting 'lavaan WARNING: some estimated lv variances are negative'. Any suggestion or solution?
I guess the problem might be the correlation between two variables (i.e. Land, Off). What do you think? Here is the output. Thank you.
> SEM<-'Land=~`L12`+`L11`
+ Off=~`O11`+`O12`+`O13`
+ Y1~Land+Off'
> #fitting SEM model
> fit<-lavaan::sem(SEM,data = StLI1)
Warning message:
In lav_object_post_check(object) :
lavaan WARNING: some estimated lv variances are negative
> fit
lavaan 0.6-5 ended normally after 77 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 14
Number of observations 242
Model Test User Model:
Test statistic 8.352
Degrees of freedom 7
P-value (Chi-square) 0.303
> summary(fit,standardized=TRUE)
lavaan 0.6-5 ended normally after 77 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 14
Number of observations 242
Model Test User Model:
Test statistic 8.352
Degrees of freedom 7
P-value (Chi-square) 0.303
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Land =~
L12 1.000 0.245 0.744
L11 -0.229 0.095 -2.398 0.016 -0.056 -0.252
Off =~
O11 1.000 NaN NaN
O12 0.660 0.225 2.930 0.003 NaN NaN
O13 0.647 0.223 2.897 0.004 NaN NaN
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Y1 ~
Land -0.020 0.050 -0.396 0.692 -0.005 -0.032
Off -0.074 0.090 -0.820 0.412 NaN NaN
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Land ~~
Off 0.020 0.004 4.637 0.000 0.929 0.929
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.L12 0.049 0.024 1.996 0.046 0.049 0.447
.L11 0.047 0.004 10.547 0.000 0.047 0.937
.O11 0.050 0.006 7.880 0.000 0.050 1.174
.O12 0.041 0.004 9.503 0.000 0.041 1.085
.O13 0.044 0.005 9.666 0.000 0.044 1.076
.Y1 0.022 0.002 10.923 0.000 0.022 0.998
Land 0.060 0.026 2.354 0.019 1.000 1.000
Off -0.007 0.004 -1.990 0.047 NaN NaN
Warning messages:
1: In sqrt(ETA2) : NaNs produced
2: In sqrt(ETA2) : NaNs produced
3: In sqrt(ETA2) : NaNs produced
Related Publications
Variable cluster analysis as implemented in PROC VARCLUS is an underutilized alternative to traditional multivariate methods for scale creation such as principal components analysis and factor analysis. It tends to produce scales that are simple and easy to interpret. Some of the reluctance to use VARCLUS may be due to the fact it is not widely dis...
Data mining, statistics and data analysis are popular techniques to study datasets and extract knowledge from them. In this paper, principal component analysis and factor analysis were applied to cluster thirteen different given arrangements about the Suras of the Holly Quran. The results showed that these thirteen arrangements can be categorized i...
Association rule plays an important role in data mining. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data warehouse. Several authors have proposed different techniques to form the associations rule in data mining. But it has been observed...