Substituting group-specific means for missing values
I have data with missing values and want to substitute means within groups for the missing data. How can I do this in SPSS Statistics?
This can be done using the AGGREGATE procedure to add a variable to the data with the group-specific means and then conditionally substituting those values in for the missing values.
For example, if your variable of interest is X and the variable denoting groups of cases is Group, you could select Data>Aggregate, specify Group as a Break variable in the Aggregate Data dialog box, specify X as a variable to summarize, keep the default function of mean and click OK. This would add a new variable to the data set with the group-specific mean of X for each case, by default named X_mean. If you wanted to substitute these values back into the original X variable, you could select Transform>Compute Variable, specify X as the variable to compute, specify X_mean as the Numeric Expression, click on the If button near the bottom of the dialog, select Include if case satisfies conditions and for the expression specify MISSING(X). Then click Continue and OK, and then OK again when asked if you want to change the existing variable. If you wanted instead to keep the original X and add a new variable that is X with group-specific means filled in, add a step of computing a new variable equal to X and then computing that variable equal to X_mean if it's missing.
In command syntax, you could accomplish this using the following:
IF MISSING(X) X=X_mean.
If you have multiple variables for which this needs to be done, simply add additional subcommands defining means of variables to the AGGREGATE subcommand and additional IF commands. If you need to define groups using combinations of more than one variable, you would add those variables to Group on the BREAK subcommand. If you want to remove the _mean computed variable(s), simply follow these commands with DELETE VARIABLES and the names of the variables to be deleted.