## Technote (troubleshooting)

## Problem(Abstract)

How can I test a regression model where I expect the slope to shift up or down, or even change sign, at a particular value of the predictor? What if I expect all values of the response variable to be increased by some constant if the predictor exceeds a certain value?

## Resolving the problem

Regression models in which the function changes at one or more points along the range of the predictor are called splines, or piecewise polynomials, and the location of these shifts are called knots. If the knots are fixed by the analyst, then splines can be fitted quite easily with the REGRESSION procedure. The properties of these splines are described below, followed by a description of the process of fitting them with SPSS.

A spline model is hypothesized when the analyst expects that the relationship between the predictor and the response variable is altered at some value or values along the range of the predictor. The shift at the knot points could involve a change in the form of the relationship, such as a shift from a linear to a quadratic relationship, the addition or subtraction of a constant to all predicted response values to the right of the knot, or simply a

change in the slope, acceleration, etc. of the regression function. Suppose we have a predictor XA and a response variable Y, where we expect the regression function to change if XA is greater than some fixed value K1. Suppose we have also computed the squared and cubed values of XA in XA2 and XA3, allowing us to test a cubic model.

The spline is fit by creating an additional set of predictor variables, XB0 to XB3, which are all equal to 0 if XA <= K1. These new variables are used to fit the change in the intercept, linear, quadratic, and cubic terms, respectively, of the model for XA > K1.

If XA > K1, then: XB0=1 ;

XB1=(XA-K1) ;

XB2=(XA-K1)**2 ;

XB3=(XA-K1)**3 .

Including XB0 in the model allows for a discontinuity, or jump point, in the regression function. Excluding XB0 means that the regression line may change direction at the knot, but the two pieces of the line will be joined at the knot. Note that some texts define the XB0 dummy variable as 1 if XA is greater than OR EQUAL TO K1, whereas others define it as 1 for XA > K1, as shown above. Hopefully, the process that is believed to underlie the shift in slope will provide guidance as to whether the function is right-continuous (XB0=1 if XA=K1) or left-continuous (XB0=0 if XA=K1) at the knot.

For all the higher-order XB terms, the value of the XB variable will be 0 at K1 in either case.

If we wanted to test the model that the regression line was continuous at K1 and linear on both sides of K1, but that the slope changed at K1, then XA and XB1 would be included as predictors. The t-test for the significance of the XB1 coefficient, in the presence of XA, would indicate whether there was a significant change in slope at K1. The sign and magnitude of the XB1 coefficient would indicate the direction and magnitude of the change.

In choosing a spline model, there is a tradeoff between the smoothness of the function at the knots and obtaining a good fit to the data. For a regression function of degree R, maximum smoothness is obtained by fixing all derivatives up to R-1 to be equal for the two pieces. This constraint is achieved by only adding the Rth degree variable from the XB set. In the above example, Y would be predicted by XA, XA2, XA3, and XB3. Only

the cubic term is altered after the knot. If this model fits the data or theory poorly, the analyst may want to relax the continuity constraints by entering lower order (XA-K1) terms (i.e. XB variables) into the model. An excellent introduction to defining and testing spline models is available in a paper by Patricia Smith ("Splines as Useful and Convenient Statistical Tools", in American Statistician, 1979). Smith includes a discussion of the constraints required when the function after the knot point is of a lower order than the function before the knot point.

Keep in mind that as the number of knots increases, as well as the number of parameters for each piece between the knots, multicollinearity can quickly become a problem.

Four examples of spline model definition and testing are provided below. In each example, assume that Y and XA are observed variables that exist in the active file. The regression commands are rather sparse in the example, i.e. without residual plots or other diagnostic information, in order to focus on the aspects that are specific to spline fitting. (However, the use of plots, collinearity and influence diagnostics should be routinely used with

REGRESSION to check model fit, adherence to assumptions, and the effect of outliers.)

EXAMPLE 1

In the first example, the linear model has 2 knots, at XA=15 and XA=25,

and is continuous at both knot points.

COMPUTE xb1 = xa - 15.

COMPUTE xc1 = xa - 25.

RECODE xb1 xc1 (lo thru 0 = 0).

REGRESSION VARIABLES = y xa xb1 xc1

/STATISTICS = DEFAULT

/DEP = y

/METHOD ENTER xa

/METHOD ENTER xb1

/METHOD ENTER xc1.

XB1 and XC1 are entered separately to test whether slope does change at the first knot without considering the second knot, by testing whether the coefficient for XB1 = 0 in the presence of XA. The

XC1 coefficient is then tested in the presence of XA and XB1.

EXAMPLE 2

In the second example, there is one knot at XA=15. The spline is continuous at the knot, but the order of the polynomial changes. Before the knot, the relationship is linear; after the knot, quadratic. XB1 and XB2 are created to model this effect. Note that the quadratic effect for the XB piece is tested in the context of the linear term for that piece of the spline. Testing higher-order terms in the presence of all lower-order terms is standard practice in fitting polynomials. However, this practice may be contrasted with that in Example 4, where the higher-order terms for the XB piece are entered first. In the latter case,

it is the continuity restrictions that are being tested sequentially, with each new XB term improving the fit of the regression function and reducing its smoothness.

COMPUTE xb1 = xa - 15.

RECODE xb1 (LO THRU 0 = 0).

COMPUTE xb2 = xb1*xb1.

REGRESSION VARIABLES = y xa xb1 xb2

/STATISTICS = DEFAULT CHA

/DEP = y

/METHOD ENTER xa

/METHOD ENTER xb1

/METHOD ENTER xb2.

EXAMPLE 3

In the third example, the spline is discontinuous at the knot, XA=15. The model is that the function is quadratic both before and after the knot. The function is right continuous at the knot, i.e. if XA=15 then

XB0=1 (the function 'jumps' at 15, not after 15).

COMPUTE XA2 = XA*XA.

COMPUTE XB0 = (XA GE 15).

COMPUTE XB1 = XB0 * (XA - 15).

COMPUTE XB2 = XB1 * XB1.

REGRESSION VARIABLES = y xa xa2 xb0 xb1 xb2

/STATISTICS = DEFAULT CHA

/DEP = y

/METHOD ENTER xa xa2

/METHOD ENTER xb0 xb1 xb2.

Here, the model with a single quadratic function is tested first. The need for the spline is then tested by testing the XB variables as a block . If the change in R-squared is not significant, it would seem that a single function describes the relationship of X to Y along the full range of X in the data.

EXAMPLE 4

In the fourth example, there is one knot at XA=15. A cubic spline is fitted with the constraint that all second-order and lower-order derivatives are equal on both sides of the knot, which is equivalent to

saying that only the cubic term of the function is adjusted after the knot. The continuity constraints are gradually relaxed through a series of METHOD subcommands that introduce the lower-order XB terms into the model. Note that lower-order terms are present for the full range of X through the XA linear and quadratic terms, even when the lower-order XB terms are absent from the model.

COMPUTE XA2 = XA*XA.

COMPUTE XA3 = XA2*XA.

COMPUTE XB0 = (XA GE 15).

COMPUTE XB1 = XB0 * (XA - 15).

COMPUTE XB2 = XB1 * XB1.

COMPUTE XB3 = XB2 * XB1.

REGRESSION VARIABLES = y xa xa2 xa3 xb0 xb1 xb2 xb3

/STATISTICS = DEFAULT CHA

/DEP = y

/METHOD ENTER XA XA2 XA3

/METHOD ENTER XB3

/METHOD ENTER XB1

/METHOD ENTER XB0.

## Historical Number

12732