Interpreting ANCOVA results when the HOS assumption fails
Prior to running an analysis of covariance (ANCOVA) in SPSS, I performed a test of the assumption of homogeneity of slopes (HOS). In so doing, I discovered that there was a significant factor-by-covariate interaction, which means that the assumption does not hold. Can you help me interpret what this means?
Resolving the problem
IMPORTANT NOTE: Please note that the formulas given in Aiken & West for the modified Johnson-Neyman technique contain an error. The values sxsqr1 and sxsqr2 should be the sums of squared deviations of the X or predictor variable around its within-group means, not the predicted sums of squares from the within-groups regressions of Y on X.
Johnson and Neyman (1936) developed a technique for determining regions of significance when there is a significant group-by-covariate interaction. This technical note will provide a sample SPSS command syntax file, which researchers may use to apply a modification of the Johnson-Neyman technique due to Potthoff (1964).
The sample SPSS command syntax file provided here is a translation of a SAS command file written by Jenn-Yun Tein of Arizona State University. The SAS version of the file appeared in Aiken and West's (1991) monograph on testing and interpreting interactions in multiple regression. While this technical note is intended to be used by researchers who are unfamiliar with the Aiken and West (1991) text, many users will find that they need to consult that text for a fuller understanding of what is involved.
To help SPSS users understand how the program works, we have included the data and SPSS commands that are necessary to replicate the regression results from Aiken and West (1991). The example involves a hypothetical data set in which the starting salaries of college graduates are modeled as a function of the type of degree earned and the overall GPA. The two groups compared in this example are engineering and business students. The following commands (1) read in the raw data; (2) perform a test of the HOS assumption using SPSS UNIANOVA; and (3) perform separate SPSS REGRESSION and DESCRIPTIVES runs for each of the two groups. The regression results and the output from DESCRIPTIVES provide the raw data for the final program, which identifies regions of significance.
Here is SPSS command syntax that replicates the example from Aiken and West (1991, pp. 134-137).
* The following data step reads in the raw data from the example.
DATA LIST FREE
/ college gpa salary.
1 2.18 28219
1 1.93 27946
1 2.31 28053
1 2.45 28209
1 2.35 27899
1 2.44 28295
1 2.13 27672
1 2.22 27756
1 3.41 28065
1 2.58 27885
2 2.78 23942
2 2.50 23205
2 2.92 23962
2 3.08 24369
2 2.96 23840
2 3.06 24452
2 2.72 23218
2 2.82 23455
2 4.00 25790
2 3.22 24206
2 2.83 23506
2 2.52 22961
2 3.36 24868
2 3.18 24223
2 2.91 24004
ADD VALUE LABELS college 1 'Engineering' 2 'Business'.
* The commands below perform a test of the HOS assumption.
salary BY college WITH gpa
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = college gpa college*gpa.
* The next set of commands runs regressions and requests descriptive statistics for each group separately.
SORT CASES BY college .
SEPARATE BY college .
/STATISTICS COEFF OUTS R ANOVA
/METHOD=ENTER gpa .
Next, here is the program for computing the covariate values that correspond to the limits of the regions of significance.
* This SPSS program is based upon Appendix C: SAS Program for Test of Critical
*Region(s), in L.S. Aikens and S.G. West (1991), Multiple Regression: Testing
*and Interpreting Interactions, published in Newbury Park, CA, by SAGE
* The SAS program was written by Jenn-Yun Tein, Arizona State University.
* The variables should appear in the order below:
dpvbl (short name for the dependent variable)
alln (total N combining two groups)
n1 (number of subjects in group 1)
n2 (number of subjects in group 2)
sxsqr1 (mean-corrected sum of squared predictor values in group 1)
sxsqr2 (mean-corrected sum of squared predictor values in group 2)
meanx1 (mean of predictor in group 1)
meanx2 (mean of predictor in group 2)
f (value of F from an F table - this is the F-value with 2 df in the numerator
and N - 4 df in the denominator for the p-value that you set as your
criterion for rejection of the null hypothesis, where N is the total N for
the two groups)
ssres (sum of squares residual - add up values of SS residual from groups 1 and 2)
b1 (slope for group 1)
b01 (intercept for group 1)
b2 (slope for group 2)
b02 (intercept for group 2) .
* Insert the variable values on one or more lines between the line with the keywords
BEGIN DATA and the line with the keywords END DATA.
* In this sample command file, the variable values inserted are those for the
example shown in Aiken and West (1991, pp. 134-137).
* The program prints the name of the dependent variable, the limit of region 1
(XL1), and the limit of region 2 (XL2).
* The desired output will be displayed in the Log portion of the SPSS output file.
*Note that the values below will match the results from Aiken & West, but the correct answers
*are given by using 1.4418 for sxsqr1 (instead of 21768.4) and 1.9037 for sxsqr2 (instead of 6671180.4).
*The correct bounds for the regions of significance are 4.60 and 8.62.
DATA LIST FREE
/ depvbl (A8) alln n1 n2 sxsqr1 sxsqr2 meanx1 meanx2 f ssres b1 b01 b2 b02.
salary 25 10 15 21768.4 6671180.4 2.40 2.99 3.47
970923 122.9 27705.0 1872 18401.6
COMPUTE mxsqr1 = meanx1**2.
COMPUTE mxsqr2 = meanx2**2.
COMPUTE sum1 = (1/sxsqr1) + (1/sxsqr2).
COMPUTE sum2 = (meanx1/sxsqr1) + (meanx2/sxsqr2).
COMPUTE sum3 = (alln/(n1*n2)) + (mxsqr1/sxsqr1) + (mxsqr2/sxsqr2).
COMPUTE sumb1 = b1 - b2.
COMPUTE sumb0 = b01 - b02.
COMPUTE sumb1sq = sumb1**2.
COMPUTE sumb0sq = sumb0**2.
COMPUTE a = (((-2*F)/(alln-4))*ssres*sum1) + sumb1sq.
COMPUTE b = ((( 2*F)/(alln-4))*ssres*sum2) + (sumb0*sumb1).
COMPUTE c = (((-2*F)/(alln-4))*ssres*sum3) + sumb0sq.
COMPUTE sqrtb2ac = ((b**2) - (a*c))**.5.
COMPUTE xl1 = (-b-sqrtb2ac)/a.
COMPUTE xl2 = (-b+sqrtb2ac)/a.
LIST / depvbl xl1 xl2.
To interpret the meaning of the limit values, researchers should construct a scatterplot that includes the data points for both groups (identified by group) and separate regression lines for each group. One way to construct such a graph is outlined in Technote 1476213.
Provided that the limits to the regions of significance fall within the range of actual data, researchers will be able to identify three regions: (1) a region in which the estimated mean for one group is higher than that for another; (2) a middle region in which the estimated means do not differ; and (3) a region in which the pattern identified in region 1 has been reversed.
In the Aiken and West (1991) example, both limits fall above the domain of possible GPA values. Inspection of the regression output and the scatterplot leads to the conclusion that the engineering graduates make more than the business graduates, regardless of a student's GPA.
Aiken, L.S. & West, S.G. (1991). Multiple Regression: Testing and interpreting interactions. Newbury Park, CA: SAGE Publications.
Johnson, P.O. & Neyman, J. (1936). Tests of certain linear hypotheses and their application to some educational problems. Statistical Research Memoirs, 1, 57-93.
Potthoff, R.F. (1964). On the Johnson-Neyman technique and some extensions thereof. Psychometrika, 29, 241-256.