I am performing Factor Analysis with data that has a lot of missing values. My data has over 9,000 records but most of the questions were not fully answered so when I run the Factor Analysis I get an error message like:
There are fewer than two cases, at least one of the variables has zero variance, there is only one variable in the analysis, or correlation coefficients could not be computed for all pairs of variables. No further statistics will be computed.
Is there any way I can handle this?
Resolving the problem
Here are three basic strategies to address this issue:
1. If there are variables which have particularly high rates of missing values, drop these variables and see if this improves the retention of cases. If you have the 'Missing Value Analysis' module, this can be very helpful in seeing patterns of missing values (4000 cases are missing on var21, var40, and var50, for example). If you have MVA installed, you will see a "Missing Value Analysis" option near the bottom of the Analyze menu in SPSS.
2. Pairwise deletion. By default, Factor uses listwise deletion of cases with missing values, i.e. a case is omitted from the analysis if it is missing on any of the variables in the Factor variable list. With pairwise deletion, each correlation is computed from all cases that are nonmissing on those 2 respective variables, without regard to their 'missingness' on the other variables in the list. You can choose pairwise deletion from the Options dialog in Factor and this may get you around the problem. However, there can be numerical problems that result from pairwise deletion (such as nonpositive definite correlation matrices) and biases in the resulting solutions. There is a white paper at http://www.smallwaters.com/whitepapers/longmiss/Longitudinal%20and%20multi-group%20modeling%20with%20missing%20data.pdf
that addresses some of the problems with traditional strategies to deal with missing values. This site is for Smallwaters, the former developers of AMOS. AMOS is a structural equation modeling program, whereas the Factor procedure is essentially for exploratory factor analysis, but the comments on pairwise deletion and substitution are instructive. The paper also comprises a chapter in:
Little, T.D., Schnabel, K.U., & Baumert, J. [Eds.] (2000). Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples. Mahwah, NJ: Lawrence Erlbaum Associates.
You can find a similar discussion in:
Arbuckle, J.L. (1996). Full information estimation in the presence of incomplete data. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced Structural EquationModeling: Issues and Techniques. (Chapter 9, pp. 242-277) Mahwah NJ: Erlbaum.
3. If you have the SPSS Missing Value Analysis (MVA) module, then you can use the MVA procedure to estimate a covariance matrix for the data that uses all of the data that is present. MVA can estimate this matrix with either the Regression method or the EM method (Expectation, Maximization, based on the work of Little & Rubin ). You can then use this covariance matrix as input to the Factor procedure.
Technote 1479694 provides an example of a Factor command that uses a correlation matrix as data. (That example uses MATRIX DATA commands to read the correlation matrix as text, but you could ignore the MATRIX DATA and focus on the FACTOR command, having created the matrix data file though MVA and OMS commands.