C Statistic and SPSS Logistic Regression.
I have used the SPSS LOGISTIC REGRESSION command (Analyze>Regression>Binary Logistic... in the menus). I would like to have the c statistic, as output by SAS PROC LOGISTIC, included in my output. However, I don't see the c statistic there, nor do I see an option in the Binary Logistic Regression dialogs or SPSS Syntax Reference Guide to request it. Is there any way to get the c statistic from this procedure?
Resolving the problem
The LOGISTIC REGRESSION procedure in SPSS does not produce the c statistic as output by SAS PROC LOGISTIC. A feature enhancement request has been filed with SPSS Development to request that an option for Measures of Association, including the c statistic, be added to the Logistic Regression procedure. However, there are two methods to produce the c statistic while performing logistic regression in SPSS. The NOMREG command, which performs multinomial logistic regression, will print this statistic when the ASSOCIATION keyword is added to the /PRINT subcommand, as described below. If you prefer to use the Binary Logistic Regression procedure, the second method is to follow the LOGISTIC REGRESSION procedure with the ROC (Receiver Operating Characteristic) graph procedure to get the c statistic. The ROC procedure is in the SPSS Base.
Method A: NOMREG command with ASSOCIATION keyword.
A new keyword, ASSOCIATION, has been added to the NOMREG procedure as of Release 15.0. When the dependent variable is a binary response, this keyword will produce the Measures of Monotone Association table. Among the statistics in that table is the "Concordance Index C", which is the c statistic in SAS PROC LOGISTIC. NOMREG is available in the menus via
Analyze>Regression>Multinomial Logistic . However, the ASSOCIATION option is not available from the Multinomial Logistic Regression dialog boxes. You must run NOMREG from a syntax window or include file, with ASSOCIATION added to the /PRINT subcommand. You can build the model from the Multiniomial Logistic Regression dialog boxes and click the Paste button, rather than the OK button. This will paste the NOMREG command, with all of your choices from the dialog boxes, into a syntax window. You can then add ASSOCIATION (capitalization not required) to the /PRINT subcommand and run the command from the RUN menu or the 'Run current' icon of the syntax window.
Here is an example NOMREG command with the ASSOCIATION keyword added:
disease (BASE=FIRST ORDER=ASCENDING) WITH age
/CRITERIA CIN(95) DELTA(0) MXITER(100) MXSTEP(5) CHKSEP(20) LCONVERGE(0)
/STEPWISE = PIN(.05) POUT(0.1) MINEFFECT(0) RULE(SINGLE) ENTRYMETHOD(LR)
/PRINT = PARAMETER SUMMARY LRT CPS STEP MFI association .
The specification "BASE=FIRST" near the top of the NOMREG command specifies that the lowest value of the dependent variable is the reference category. If the dependent variable was coded as 0 and 1, with 1 as the 'event' category to be predicted, then "BASE=FIRST" is the correct designation. By default, NOMREG uses the highest value of the dependent variable as the reference category.
Method B: Binary Logistic Regression followed by ROC graph.
1. When running LOGISTIC REGRESSION, save the predicted probabilities to the active data file, using the /SAVE subcommand or by clicking the Save button and checking Probabilities in the "Save New Variables" dialog that opens. By default, the predicted probability will be stored in a variable named pre_1 (the first time that you use the option in a session - then pre_2, pre_3, etc.). You can choose a different variable name if you run the LOGISTIC REGRESSION command from a syntax window.
2. Use the predicted probabilities variable (pre_1 by default, as noted above) as the Test variable in the ROC graph procedure. This procedure is available from the menu system under Graph->ROC curve in Releases 9.0-14.0, and from Analyze>ROC Curve beginning with Release 15.0. The State variable will be the dependent variable from the LOGISTIC REGRESSION and the 'Value of state variable' will be the target or event value of that dependent variable. LOGISTIC REGRESSION predicts the higher of the 2 values of the dependent variable (DV). If that DV was coded 0 and 1, LOGISTIC REGRESSION predicts the 1 value and 1 will be the 'Value of state variable' in the ROC curve dialog. The ROC procedure prints "Area Under The Curve" as part of the default output and this area statistic corresponds to the c statistic from SAS PROC LOGISTIC. There are other options in the ROC procedure that may be of interest (the diagonal reference line; the "Coordinate points of the ROC curve", if you want to use the ROC to pick a cutoff value for the predicted probability, etc.).
The following SPSS command syntax reproduces the c statistic from a PROC LOGISTIC example in the manual:
SAS Institute Inc. "SAS/STAT Software: Changes and Enhancements Through Release 6.11". Cary NC: SAS Institute Inc., 1996.
This is example 16.5 on pages 465-467 of that manual. The dependent variable is DISEASE, with 0 for disease absent and 1 for disease present. The independent variable is AGE, with 6 values from 25 to 75, so there are 12 cases in the file to represent the combinations of DISEASE and AGE. The variable WT holds the counts for each of these combinations. WT is assigned as a weight variable in SPSS, so the procedures will treat a WT value of 14 as representing 14 cases. You can assign a weight variable from Data->Weight cases in the Data Editor or by using the WEIGHT command, as in
WEIGHT BY wt.
DISEASE-AGE combinations where the weight count equals 0 were included to make the SPSS representation of the SAS data clear. These cases will prompt a warning message in LOGISTIC REGRESSION and ROC output that there were cases with weights of 0 that were omitted from the analysis (as they should be). The warning can be ignored. You can delete cases with WT values of 0 to avoid the warning if you wish.
Here is the syntax to read the data, run the LOGISTIC REGRESSION procedure, and find the c statistic with the ROC procedure. See the "Area Under the Curve" statistic in the ROC output.
DATA LIST FREE / disease age wt .
1 25 0
0 25 14
1 35 0
0 35 20
1 45 0
0 45 19
1 55 7
0 55 11
1 65 6
0 65 6
1 75 17
0 75 0
WEIGHT BY wt .
LOGISTIC REGRESSION VAR=disease
/SAVE PRED (dispred)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
dispred BY disease (1)
/PLOT = CURVE(REFERENCE)
/CRITERIA = CUTOFF(INCLUDE) TESTPOS(LARGE) DISTRIBUTION(FREE) CI(95)
/MISSING = EXCLUDE .
Note that the new predicted probabilities variable was named as DISPRED in the /SAVE subcommand of the LOGISTIC REGRESSION command. DISPRED was then used as the test variable in the ROC graph. The area reported by ROC, .953, matches the c statistic reported in the SAS PROC LOGISTIC output on page 466 of the SAS manual.
If you would like details on the algorithms used in the ROC graph procedure, please go to Help>Algorithms