I have a binary outcome variable, Y, and a variable X, which has ordered categories. I would like to test for a linear trend in proportions on Y across levels of X. Can SPSS perform the Cochran-Armitage test of trend?
Resolving the problem
SPSS does not provide the Cochran-Armitage (CA) test directly, and a request has been filed with SPSS Development to add this statistic to Crosstabs output where applicable. However, the test statistic can be derived from the Linear-by-Linear Association (LLA) test for trend results of the Crosstab procedure, provided that either the row or column variable is dichotomous. The LLA test for trend equals (N-1)*r^2, where N is the sample size and r is the Pearson correlation between the 2 variables. (See Technote 1477269 for more information on the LLA test, also known as the Mantel-Haenzsel test for trend. This test is not related to the Cochran-Mantel-Haenzsel tests for common odds ratio that are also available in the Crosstabs procedure.)
Agresti (2002) states that for a 2*I table, where the I categories are ordered, the CA test is equivalent to the LLA test (called M^2 in Agresti), except that (N-1) is replaced by N in the CA test. So, you can run Crosstabs and request the chi-square tests in the Statistics dialog. The LLA test will be printed as 'Linear by Linear Association' in the 'Chi-Square Tests' table in the Crosstabs output. If you multiply the LLA value by N/(N-1), you will get the CA test statistic.
As N approaches infinity, N/(N-1) approaches 1.0 from above and the significance for the LLA statistic can be seen as an upper bound for the significance of the CA statistic. (Both are chi-square tests with 1 degree of freedom (DF)). However, you can get a precise significance for the CA statistic by creating a new variable in SPSS that will hold the significance value.
Suppose that the LLA statistic was 5.817 and N=234. The CA statistic equals 5.817*234/233=5.842, which has a significance level of .015648.. The following commands would compute the CA value from the LLA result and then compute the right-tail area, i.e.significance, for that 5.842 under a chi-square distribution with 1 DF.:
compute catrend = 5.817*234/233 .
compute sigca = 1 - cdf.chisq(catrend,1).
compute catrend = 5.817*234/233 .
compute sigca = sig.chisq(catrend,1).
If you prefer the menu system to syntax commands, you can compute each of CATREND and SIGCA from the Compute dialog, which is available under the Transform->Compute menu.. If you had several CA test statistics to compute, you could create an SPSS data file with variables for LLA and for N and a case with each LLA result. You could compute the significance for the CA test as
compute sigca = sig.chisq(lla*n/(n-1),1).
The Agresti reference is :
Agresti, Alan. (2002). Categorical Data Analysis (2nd Ed.). New York: Wiley.
See pages 181-182 for discussion of the Cochran-Armitage test. The data for the example there is in Table 5.3 on p. 179.
In the example in the Agresti text, there are 5 ordered categories representing amount of maternal alchohol consumption and a binary outcome variable that represents the presence or absence of child congenital malformations. Agresti assigns a set of unevenly-spaced scores to the 5 alchohol categories that are intended to better represent the scale of the variable than the simple numbers 1 to 5. The use of these scores in Crosstabs (vs the categories 1 to 5), does not affect the Pearson or Likelihood ratio chi-square tests of independence, but it does affect the LLA result, which uses the Pearson Correlation in its calculation. If you assign scores to your ordered categories, the relative spacing of the scores will affect the strength of linear association reflected in the LLA test and therefore in the Cochran-Armitage test. The following command set will read the data for the Agresti example, assign the scores (SCORE) for the ordered categories (ALCH), run Crosstabs to obtain the LLA result and then plug those results into Compute commands to calculate the Cochran-Armitage test statistic (CACHISQ) and its significance level (SIGCA). Note that the data set contains aggregated data, so the WEIGHT command is used to weight the cases by the variable WT, which holds the cell counts.
data list free / alch malform wt.
1 1 48
1 0 17066
2 1 38
2 0 14464
3 1 5
3 0 788
4 1 1
4 0 126
5 1 1
5 0 37
recode alch (1=0) (2=.5) (3=1.5) (4=4) (5=7) into score.
weight by wt.
/TABLES=score BY malform
/FORMAT= AVALUE TABLES
/COUNT ROUND CELL .
* The Linear-by-Linear test statistic = 6.57 with significance = .0104.
* The number of decimal places for this output was extended from 2 to 6
* in the pivot table editor.
* The N for this example is 32,574 .
compute cachisq = 6.569932*32574/32573 .
compute sigca = sig.chisq(cachisq,1).