Question
Asked 26th Jul, 2014

How can I decide between using principal components analysis versus factor analysis?

These two methods may appear similar to the user, but aren't they quite different, and what would you tell a person who is considering using such methods? Thank you for your expert advises. 

Most recent answer

Hamza Bouguerra
Badji Mokhtar - Annaba University
Thank you all for your relevant answers.

Popular answers (1)

Linda Sanner
Gateways Hospital
Factor analysis (FA) is a group of statistical methods used to understand and simplify patterns of relationships underlying measured variables (Beavers, Lounsbury, Richards, Huck, Skolits, & Esquivel, 2013; Fabrigar, Wegener, MacCallum, & Strahan, 1999; Schmitt, 2011). Factor analysis is a concept that includes both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) (Jennrich & Bentler, 2011).
CFA tests whether a known factor model can predict a set of observed data (DeCoster, 1998). Researchers use CFA to verify or confirm hypotheses or theory (Ruscio & Roche, 2012; Schmitt, 2011), establish the validity of the factor model, compare two models using the same data, test the significance of factor loading, test relationships between factor loadings, test for correlation or lack of correlation of factors, and assess convergent and discriminate validity of measures (DeCoster, 1998).
EFA tests the number of common factors that influence measures and tests the strength and relationship between each common factor to the corresponding measure (DeCoster, 1998). Researchers use EFA to identify the nature of constructs that underlie responses given in a questionnaire, determine sets of items that interconnect, demonstrate the depth and breadth of measurement scales, classify the most important features of a group of items, and generate factor scores that represent the underlying constructs (DeCoster, 1998). Because EFA is a multivariate statistical approach, it is appropriate for reducing the number of factors, examining relationships between categories, and evaluating the construct validity of a measurement scale (Williams et al., 2010).
Exploratory factor analysis involves a series of statistical analysis steps. The first is the planning phase, where it is determined if the data is suitable for EFA by selecting the sample size then after collecting the data, creating a correlation matrix and testing for adequacy. The second step is to extract factors. The third step is to determine the number of factors to retain. The fourth step is factor rotation. The fifth step is to interpret the factor structure.
Principal component analysis (PCA) is a method of factor extraction (the second step mentioned above). Researchers use PCA when they want to reduce the number of variables while retaining as much of the original variance as possible (Conway & Huffcutt, 2003).
REFERNCES
Beavers, A. S., Lounsbury, J. W., Richards, J. K., Huck, S. W., Skolits, G J., & Esquivel, S. L. (2013). Practical considerations for using exploratory factor analysis in educational research. Practical Assessment, Research & Evaluation, 18(6), 1-13. Retrieved from http://www.pareonline.net/pdf/v18n6.pdf
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168. doi:10.1177/1094428103251541
DeCoster, J. (1998). Overview of Factor Analysis. Retrieved from http://www.stat-help.com/factor.pdf
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C. & Strahan, E J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299. doi:1082-989X/99/S3.00
Jennrich, R. I., & Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika, 76, 537-549. foi:10.1007/s11336-011-9218-4
Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in exploratory factor analysis using comparison data of known factorial structure. Psychologocial Assessment, 24(2), 282-292. doi:10.1037/a0025697
Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304-321. doi:10.1177/0734282911406653
26 Recommendations

All Answers (29)

Rita Rueff-Lopes
Universitat Ramon Llull
In factor analysis normally you already have a model where the objective is to predict observed variables from theoretical latent factors whereas in principal component analysis the objective is to extract linear composites of observed variables.
Raid Amin
University of West Florida
Thank you Rita, Thank you Farhat.
If you think that "there may be some underlying theoretical relationship", but you are unsure of it, would you still choose Factor Analysis of PCA?
Say, you suspect that certain cancer rates are somehow associated with air pollution. Could you use a FA model where you "throw in" all variables, with the goal to see if the cancer variables somehow appear in certain factors with air pollution?
Cyril Iaconelli
University of Burgundy
Dear Raid,
I would say that FA is more for the determination underlying variables which explains why two other variables are correlated. While PCA is more on the distribution of individuals explained by principal component (i.e. by correlation between factors).
I would say that the choice depends on what you are the most interested factors or individual.
Finally I found that the PCA of the package FactoMineR (in R) is the best compromise for multivariate analysis:
Best regards,
Cyril
1 Recommendation
Aleksandar Savić
Université de Lille
You didn't wrote what kind of data you have.
For example, my data are mainly spectroscopic, thus always check physical meaning of extracted components (factor). (option for SPSS: check, scores, save as variables).
In other cases, look up the percentage of explained variance higher is (sometmes) better. For example when apply high kappa for promax in case of fluoresnce emission spectral components become "over-fitted" and gaining hiht percentage. 
Also check if something is changing in qualitative meaning when changing the methods. Up to date, only once or twice I got different grouping of variables (some HPCL data) applying PCA and FA (with all possible options). 
1 Recommendation
Yoilan Fimia-León
Universidad Central "Marta Abreu" de las Villas
Principal components analysis is only a data reduction method. It was common many decades ago when computers were slow. I know it is the default method in many statistical applications but factor analysis seems to be superior.
You can take a look to the following article where more information about this technique is provided:
Costello, A. B., & Osborne, J. W. (2005). Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis. Practical Assessment, Research & Evaluation, 10(7). Retrieved from http://pareonline.net/getvn.asp?v=10&n=7
If you need further guidance don't hesitate to contact me
Raid Amin
University of West Florida
My main interest in factor analysis is to study relationships between several types of diseases in the population, and how such variables are related to other variables from different fields. Then I aim to output factor scores and use those in a cluster analysis. Can this also be done with PCA?
Cyril Iaconelli
University of Burgundy
I would say that PCA and FA are not a (good) tools to find correlation between different variables.
This analysis try to explain several variables in one factor (or component).
For sure, two variables explaining the same factor (or component) should be correlated, but I don't think that is the aim of this kind of analysis.
Maybe a simple correlation matrix would help you better than those analysis ? (please find the link below to compute a correlation matrix on R)
Regards
1 Recommendation
Raid Amin
University of West Florida
Correlation analysis is not what I want here. I want to do a cluster analysis on factor scores.
Linda Sanner
Gateways Hospital
Factor analysis (FA) is a group of statistical methods used to understand and simplify patterns of relationships underlying measured variables (Beavers, Lounsbury, Richards, Huck, Skolits, & Esquivel, 2013; Fabrigar, Wegener, MacCallum, & Strahan, 1999; Schmitt, 2011). Factor analysis is a concept that includes both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) (Jennrich & Bentler, 2011).
CFA tests whether a known factor model can predict a set of observed data (DeCoster, 1998). Researchers use CFA to verify or confirm hypotheses or theory (Ruscio & Roche, 2012; Schmitt, 2011), establish the validity of the factor model, compare two models using the same data, test the significance of factor loading, test relationships between factor loadings, test for correlation or lack of correlation of factors, and assess convergent and discriminate validity of measures (DeCoster, 1998).
EFA tests the number of common factors that influence measures and tests the strength and relationship between each common factor to the corresponding measure (DeCoster, 1998). Researchers use EFA to identify the nature of constructs that underlie responses given in a questionnaire, determine sets of items that interconnect, demonstrate the depth and breadth of measurement scales, classify the most important features of a group of items, and generate factor scores that represent the underlying constructs (DeCoster, 1998). Because EFA is a multivariate statistical approach, it is appropriate for reducing the number of factors, examining relationships between categories, and evaluating the construct validity of a measurement scale (Williams et al., 2010).
Exploratory factor analysis involves a series of statistical analysis steps. The first is the planning phase, where it is determined if the data is suitable for EFA by selecting the sample size then after collecting the data, creating a correlation matrix and testing for adequacy. The second step is to extract factors. The third step is to determine the number of factors to retain. The fourth step is factor rotation. The fifth step is to interpret the factor structure.
Principal component analysis (PCA) is a method of factor extraction (the second step mentioned above). Researchers use PCA when they want to reduce the number of variables while retaining as much of the original variance as possible (Conway & Huffcutt, 2003).
REFERNCES
Beavers, A. S., Lounsbury, J. W., Richards, J. K., Huck, S. W., Skolits, G J., & Esquivel, S. L. (2013). Practical considerations for using exploratory factor analysis in educational research. Practical Assessment, Research & Evaluation, 18(6), 1-13. Retrieved from http://www.pareonline.net/pdf/v18n6.pdf
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168. doi:10.1177/1094428103251541
DeCoster, J. (1998). Overview of Factor Analysis. Retrieved from http://www.stat-help.com/factor.pdf
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C. & Strahan, E J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299. doi:1082-989X/99/S3.00
Jennrich, R. I., & Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika, 76, 537-549. foi:10.1007/s11336-011-9218-4
Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in exploratory factor analysis using comparison data of known factorial structure. Psychologocial Assessment, 24(2), 282-292. doi:10.1037/a0025697
Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304-321. doi:10.1177/0734282911406653
26 Recommendations
Raid Amin
University of West Florida
Thank you for the detailed answer here,  Linda. I posted it to my class.
1 Recommendation
Linda Sanner
Gateways Hospital
Happy to help, Raid.
They are actually different tehcniques based on different assumptions and used for different objectives. PCA is only a geometric or statistical trasnformation of data in order to get  new synthetic variables, while FA suppose a model with some assumptions about the data generation. I can provide you the link to a publication where we compare both techniques in the financial context. I hope this helps
5 Recommendations
Raid Amin
University of West Florida
Thank you very much, Rogelio. I will read the article.
My pleasure Dr. Amin, best regards.
2 Recommendations
  • The decision of whether to use EFA or PCA can only be made when the goals of a study are clearly known and specified.
  • If the goal of a study is to obtain linear composites of observed variables that retain as much variance as possible, then PCA is the correct procedure.
  • On the other hand, if the goal is to determine interpretable constructs that maximally explain Covariances among a set of observed variables, then EFA is the correct procedure.
  • Source Byrne, B. M. (2005, P.28). Factor analytic models: Viewing the structure of an assessment instrument from three perspectives. Journal of personality assessment, 85(1), 17-32.
1 Recommendation
Raid Amin
University of West Florida
Thank you all for your valuable input to this thread. It shows many "reads" by many people so far, so this question may have been in place.
1 Recommendation
Sarah Coriat
Anglia Ruskin University
Thank you for asking this question in 2014 as I am having the same one today in 2017 ;-). 
1 Recommendation
Raid Amin
University of West Florida
Hi Sarah,
Just by looking at the many counts of people who have viewed the responses to my question could be an indication that this topic is still not taught well (or not understood well).
Raid Amin
University of West Florida
Thank you for your insightful contribution above, Paul. 
1 Recommendation
Raid Amin
University of West Florida
I have been encountering some interesting challenges when using factor analysis. If as a first step, we obtain a factor analysis, and then we output factor scores from the first few factors, then how do we use the factor scores in a cluster analysis in the next step? 
Specifically: Let's say that we want to use the factor scores from Factor 1. The loadings are large for some variables and small for other variables; some are positive and some are negative. What will we "see" contained in the factor scores for Factor 1? If we will use the factor scores in a cluster analysis that can identify High clusters and also Low clusters, what exactly do such clusters mean, as realted to the original Factor 1 variables?
Has anyone here done such a cluster analysis? Please share with us your thoughts.
Raid Amin
University of West Florida
Thank you for your detailed response to my question, Paul. While FA is widely taught by Psychology Departments, it is less often found in statistics programs.  
1 Recommendation
Timothy A Ebert
University of Florida
Neither are taught in entomology departments. Of course there aren't many options when your crowning achievement is 4 replicates.
There was a class at UC Davis in the late 80's in multivariate analysis that was required as part of getting a Minor in that subject. I know we went over PCA, but maybe not FA. I don't remember what textbook we used. The next encounter was about 5 years later when I had a large data set for my Ph.D.. I spent many happy hours stuffing my data through most of the procedures in the SAS-Stat user manual.
1 Recommendation
Hamza Bouguerra
Badji Mokhtar - Annaba University
Thank you all for your relevant answers.

Similar questions and discussions

SEM in R 'lavaan WARNING: some estimated lv variances are negative'?
Question
5 answers
  • Qirui LiQirui Li
Dear all,
I am running an SEM in R. However, the model does not fit with reporting 'lavaan WARNING: some estimated lv variances are negative'. Any suggestion or solution?
I guess the problem might be the correlation between two variables (i.e. Land, Off). What do you think? Here is the output. Thank you.
> SEM<-'Land=~`L12`+`L11` + Off=~`O11`+`O12`+`O13` + Y1~Land+Off' > #fitting SEM model > fit<-lavaan::sem(SEM,data = StLI1) Warning message: In lav_object_post_check(object) : lavaan WARNING: some estimated lv variances are negative > fit lavaan 0.6-5 ended normally after 77 iterations Estimator ML Optimization method NLMINB Number of free parameters 14 Number of observations 242 Model Test User Model: Test statistic 8.352 Degrees of freedom 7 P-value (Chi-square) 0.303
> summary(fit,standardized=TRUE) lavaan 0.6-5 ended normally after 77 iterations Estimator ML Optimization method NLMINB Number of free parameters 14 Number of observations 242 Model Test User Model: Test statistic 8.352 Degrees of freedom 7 P-value (Chi-square) 0.303 Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard errors Standard Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all Land =~ L12 1.000 0.245 0.744 L11 -0.229 0.095 -2.398 0.016 -0.056 -0.252 Off =~ O11 1.000 NaN NaN O12 0.660 0.225 2.930 0.003 NaN NaN O13 0.647 0.223 2.897 0.004 NaN NaN Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all Y1 ~ Land -0.020 0.050 -0.396 0.692 -0.005 -0.032 Off -0.074 0.090 -0.820 0.412 NaN NaN Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all Land ~~ Off 0.020 0.004 4.637 0.000 0.929 0.929 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .L12 0.049 0.024 1.996 0.046 0.049 0.447 .L11 0.047 0.004 10.547 0.000 0.047 0.937 .O11 0.050 0.006 7.880 0.000 0.050 1.174 .O12 0.041 0.004 9.503 0.000 0.041 1.085 .O13 0.044 0.005 9.666 0.000 0.044 1.076 .Y1 0.022 0.002 10.923 0.000 0.022 0.998 Land 0.060 0.026 2.354 0.019 1.000 1.000 Off -0.007 0.004 -1.990 0.047 NaN NaN Warning messages: 1: In sqrt(ETA2) : NaNs produced 2: In sqrt(ETA2) : NaNs produced 3: In sqrt(ETA2) : NaNs produced

Related Publications

Article
Full-text available
Variable cluster analysis as implemented in PROC VARCLUS is an underutilized alternative to traditional multivariate methods for scale creation such as principal components analysis and factor analysis. It tends to produce scales that are simple and easy to interpret. Some of the reluctance to use VARCLUS may be due to the fact it is not widely dis...
Preprint
Full-text available
Data mining, statistics and data analysis are popular techniques to study datasets and extract knowledge from them. In this paper, principal component analysis and factor analysis were applied to cluster thirteen different given arrangements about the Suras of the Holly Quran. The results showed that these thirteen arrangements can be categorized i...
Conference Paper
Full-text available
Association rule plays an important role in data mining. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data warehouse. Several authors have proposed different techniques to form the associations rule in data mining. But it has been observed...
Got a technical question?
Get high-quality answers from experts.