# Can SPSS fit Markov Chain models for transitions across states of a categorical variable?

## Technote (troubleshooting)

## Problem(Abstract)

I have a set of four categorical variables that represent observations that were taken at four time points. I would like to model the probabilities of transitions across categories from time to time as a Markov chain. Can SPSS fit a Markov chain model of state transitions?

## Resolving the problem

The following SPSS commands illustrate the use of the SPSS procedures GENLOG and CNLR (Constrained NonLinear Regression) to perform first- and second-order Markov chains and a pair of alternate models. The example is taken from :

Agresti, A. (2002). Categorical Data Analysis (2nd Ed.). New York: Wiley. (pp. 476-479).

In this example, variables X9, X10,X11,and X12 are binary variables that represent the presence (1) or absence (0) of respiratory illness in children that were observed each year at ages 9 to 12. The data are reported in an aggregated form, where each case represents one of the 16 possible sequences of values for the four variables. The variable COUNT holds the number of subjects observed at each sequence. The data differs from Agresti's table only in that variables X9 to X12 are coded as 0 and 1, rather than as 1 and 2. This change was made for the CNLR run. GENLOG would have produced the same results with the (1,2) coding in Agresti's data table. The GENLOG and CNLR commands below run all four models reported on pages 478-479 of Agresti. The goodness of fit results match those from Agresti's text and the parameter estimates for the last 2 models match those in Agresti's Table 11.8.

Agresti does not report the parameter estimates for the first 2 models, the first-order and second-order Markov chains, which both fit poorly. You can find the goodness of fit results (to compare to Agresti's G^2) in the Likelihood Ratio row of the Goodness-of-Fit table of the GENLOG output. In the CNLR output, look at the last row of the Iteration History table (Iteration 18.1). The Value of Loss Function reports the G^2 result and the parameter estimates are reported in the various Parameter columns.

CNLR was used for the last model because of its capability to constrain parameters (to be equal in this case). Note that the COUNT is designated as a weight variable for the GENLOG analyses but this designation is turned off for the CNLR comand. COUNT is incorporated in the CNLR loss function, which defines the random component for that model.

* Replicate Markov chain results from Agresti (2002) Table 11.7.

data list free / x9 x10 x11 x12 count .

begin data.

0 0 0 0 94

0 0 0 1 30

0 0 1 0 15

0 0 1 1 28

0 1 0 0 14

0 1 0 1 9

0 1 1 0 12

0 1 1 1 63

1 0 0 0 19

1 0 0 1 15

1 0 1 0 10

1 0 1 1 44

1 1 0 0 17

1 1 0 1 42

1 1 1 0 35

1 1 1 1 572

end data.

execute.

formats x9 to x12 (f4) count (f8).

weight by count.

* Modify the path in the SAVE command below to reflect the folders on your computer .

SAVE OUTFILE='C:\Markov Chain transition\Agresti examples\Agresti page 478_MC.sav'

/COMPRESSED.

* First order Markov chain p 478 .

GENLOG

x9 x10 x11 x12

/MODEL = POISSON

/PRINT = FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV

/PLOT = NONE

/CRITERIA = CIN(95) ITERATE(20) CONVERGE(.001) DELTA(.5)

/DESIGN x9 x10 x11 x12 x9*x10 x10*x11 x11*x12 .

* Second order Markov chain p 478 .

GENLOG

x9 x10 x11 x12

/MODEL = POISSON

/PRINT = FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV

/PLOT = NONE

/CRITERIA = CIN(95) ITERATE(20) CONVERGE(.001) DELTA(.5)

/DESIGN x9 x10 x11 x12 x9*x10 x10*x11 x11*x12 x9*x11 x10*x12 x9*x10*x11 x10*x11*x12 .

* Replicate results from Table 11.8 from Agresti. First column results using GENLOG.

* All pairwise interactions included without constraints .

GENLOG

x9 x10 x11 x12

/MODEL = POISSON

/PRINT = FREQ RESID ADJRESID ZRESID DEV ESTIM CORR COV

/PLOT = NONE

/CRITERIA = CIN(95) ITERATE(20) CONVERGE(.001) DELTA(.5)

/DESIGN x10 x11 x12 x9 x10*x11 x10*x12 x10*x9 x11*x12 x11*x9 x12*x9 .

*Simpler transition model from Agresti, with constraints, via CNLR. Loss function is LR GOF chi-square statistic.

*Starting values are rounded averages of values from GENLOG model for transition terms, with 0s for others.

WEIGHT OFF.

* Use product of variables for interactions, rather than specifically computed interaction variables.

MODEL PROGRAM b0=0 b9=0 b10=0 b11=0 b12=0 b9_10=1.77 b9_11=1.02 b9_12=1.02 b10_11=1.77

b10_12=1.02 b11_12=1.77 .

COMPUTE PRED_ = EXP(b0 + x9 * b9 + x10 * b10 + x11 * b11 + x12 * b12 + x9*x10* b9_10

+ x9*x11 * b9_11 + x9*X12 * b9_12 + x10*x11 * b10_11 + x10*x12 * b10_12 +

x11*x12 * b11_12).

COMPUTE LOSS_ = 2 * (count * (LN(count / PRED_)) - (count - PRED_)).

CNLR count

/PRED PRED_

/LOSS LOSS_

/BOUNDS b9_10 - b10_11 = 0; b9_10 - b11_12 = 0; b9_11 - b9_12 = 0; b9_11 - b10_12 = 0

/CRITERIA STEPLIMIT 2 ISTEP 1E+20 .

## Historical Number

71481

### Document information

**More support for:**
SPSS Statistics

**Software version:**
Not Applicable

**Operating system(s):**
Platform Independent

**Reference #:**
1478480

**Modified date:**
10 May 2013