IBM Support

Pairwise vs. Listwise deletion: What are they and when should I use them?

Troubleshooting


Problem

What is the difference between listwise and pairwise deletion of cases? Under what circumstances is each type of case deletion allowed?

Resolving The Problem

A case may be omitted from an analysis because it contains one or more missing values in the variables being analyzed.

In listwise deletion a case is dropped from an analysis because it has a missing value in at least one of the specified variables. The analysis is only run on cases which have a complete set of data.

Pairwise deletion occurs when the statistical procedure uses cases that contain some missing data. The procedure cannot include a particular variable when it has a missing value, but it can still use the case when analyzing other variables with non-missing values. A case may contain 3 variables: VAR1, VAR2, and VAR3. A case may have a missing value for VAR1, but this does not prevent some statistical procedures from using the same case to analyze variables VAR2 and VAR3. Pairwise deletion allows you to use more of your data. However, each computed statistic may be based on a different subset of cases. This can be problematic. For example, a correlation matrix computed using pairwise deletion may not be positive semidefinite. That is, it may have negative eigenvalues, which can create problems for various statistical analyses. This can occur because when correlations are computed using different cases, the resulting patterns can be ones that are impossible to produce with complete data.

Note that the means and standard deviations computed when pairwise deletion is specified are based on all available data for each variable. Correlations are based on all data available for each pair of variables.

The choice between pairwise and listwise deletion of records is limited. The choice between these two types of deletion is not relevant when only one variable is being analyzed. In other situations, missing values may be treated as a valid category. If a record has a missing value for a crucial dependent variable, it probably cannot be used in the analysis. Pairwise vs. listwise is a different choice from the decision on whether to include or exclude user-defined missing values within a procedure.

Having limited the scope of pairwise vs. listwise deletion of records, the following describes when you may choose between these deletion types:

SPSS procedures will usually perform listwise deletion of records, especially the more advanced modeling procedures. You will not have a choice - the procedure will automatically perform listwise deletion of records.

Pairwise deletion is allowed in the following procedures:
CORRELATIONS (pairwise is the default)
NONPARR CORR (pairwise is the default)
DESCRIPTIVES (pairwise, as subcommand VARIABLE, is the default)
PARTIAL CORR (pairwise, as subcommand ANALYSIS, is the default)
EXAMINE
FACTOR
QUICK CLUSTER
REGRESSION

In general, where you have a choice, you can choose between two options with command syntax via the /MISSING subcommand.
You would use either:

/MISSING=LISTWISE
or
/MISSING=PAIRWISE

Note that both LISTWISE and PAIRWISE deletion methods make very strict assumptions about the mechanisms that cause data to be missing. In order for these methods to produce appropriate results in most situations, data must be what is known as MCAR, or missing completely at random, meaning that the missing values must be unrelated to the observed values. Some more widely applicable approaches are provided by the SPSS Statistics Missing Values Analysis option, including multiple imputation methods. The Amos program also offer options for multiple imputation methods.

Finally, note that many SPSS Statistics procedures offer the option INCLUDE on their MISSING subcommands. This option has nothing to do with listwise vs. pairwise deletion. It deals specifically with used defined missing values. If INCLUDE is specified, in effect user defined missing values for appropriate variables are turned off; those values are treated as valid data. This has no effect on system missing values.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

33189

Document Information

Modified date:
16 April 2020

UID

swg21475199