The computation of discriminant scores
Using SPSS data transformation procedures, I have tried to replicate the values for the discriminant scores that SPSS saved to my data file. The results I got were not identical to the scores calculated by the DISCRIMINANT procedure. Can you show me how to correctly compute discriminant scores?
Resolving the problem
The usual mistake that people make when they first try to do this is that they multiply the values from the table entitled Standardized Discriminant Function Coefficients by standardized values of the predictors. In so doing, they use the DESCRIPTIVES procedure to compute standardized variables (i.e., z-scores). The z-scores produced by DESCRIPTIVES have been standardized by the sample total standard deviations. The sample total standard deviations are computed for each variable using all of the cases in the file and ignoring group membership. What DISCRIMINANT does is to standardize by the pooled within-groups standard deviations.
Pooled within-groups standard deviations are calculated by pooling the variance estimates from each level of the grouping variable. The pooled within-groups variance estimates are the mean square error terms (MSE) from individual one-way ANOVAs. The square roots of these MSE values are the pooled within-groups standard deviations. Hence, to compute a z-score based upon the pooled-within groups standard deviation, we need a COMPUTE statement of the following form.
Let ZPOOLED equal the z-score based upon the pooled standard deviation.
Let X be the name of the predictor.
Let MSE equal the mean square error terms from a one-way ANOVA with the predictor as the dependent variable.
Let MEANX equal the mean value of X computed using all of the cases to be used in the analysis.
An SPSS COMPUTE statement for calculating ZPOOLED is as follows.
COMPUTE zpooled = (x - meanx) / SQRT(mse).
You can get all of the values you need to compute the discriminant scores yourself from output produced by DISCRIMINANT. The annotated command file below shows how to do this for a small sample data file.
* The following commands read in a sample data file with two groups and two predictors .
DATA LIST FREE
/ group x y .
1 1 3
1 2 2
1 2 4
1 3 3
1 2 2
2 3 5
2 2 5
2 3 4
2 3 4
2 4 3
FORMATS group x y (F1.0) .
* The following commands run the DISCRIMINANT analysis.
* Note that we have used the SCORE keyword with the SAVE subcommand, which saves the discriminant scores computed by the procedure into a variable named DIS1_1.
* We have also used the MEAN and COV keywords with the STATISTICS subcommand to request the printing of the means and the pooled-within groups covariance matrix.
/CLASSIFY=NONMISSING POOLED .
* The MEANS we need to compute ZPOOLED values for X and Y are shown in the rows identified as Total at the bottom of the output table entitled Group Statistics.
* The mean values are 2.5 and 3.5 for X and Y, respectively.
* The MSE values we need for X and Y may be read from the diagonal of the output table entitled Pooled Within-Groups Matrices.
* The MSE values are .5 and .7 for X and Y, respectively.
* The following COMPUTE statements create the two sets of z-scores standardized by the pooled within-qroups standard deviation.
COMPUTE zpooledx = (x - 2.5000000) / SQRT(.5000000) .
COMPUTE zpooledy = (y - 3.5000000) / SQRT(.7000000) .
* The coefficients we need for the single set of standardized discriminant scores are given in the output table entitled Standardized Canonical Discriminant Function Coefficients.
* The values are 0.8975670 and 0.9608713, for x and y, respectively.
* The final COMPUTE statement (followed by a FORMATS and an EXECUTE command) saves a new variable called SCORE with values that are identical to those for DIS1_1.
COMPUTE score = (0.8975670 * zpooledx) + (0.9608713 * zpooledy) .
FORMATS score (F7.5) .
Note that the discriminant scores can also be computed by applying the raw or unstandardized discriminant function coefficients, including the constant for each function, to the raw predictor variable values.