IBM Support

Influence variables and weights in SPSS Classification trees

Troubleshooting


Problem

In the SPSS Classification Tree dialog, I see a box for "Influence variable". Please define the influence variable and its role in the Tree growing methods. The "Syntax Reference Guide" states that the "INFLUENCE subcommand defines an optional influence variable that defines how much influence a case has on the tree-growing process. Cases with lower influence values have less influence, cases with higher values have more." This seems like a weight variable, but placing my weight variable in the Influence variable box has no apparent effect on the counts and percentages in the tree node statistics.

Resolving The Problem

Influence variables in TREE perform the function that is performed by "case weights" in AnswerTree, i.e., they are used to account for differences in variance across levels of the target variable (heteroscedasticity). See Technote 1592438 for a discussion of case weights and frequency variables in AnswerTree. The influence values are used in model estimation but DO NOT affect cell frequencies. Influence values should be positive, but they can be fractional. Cases with negative or zero influence are excluded from the analysis. With a categorical dependent variable, cases that belong to the same dependent variable class and the same predictor variable are grouped together as a cell. The corresponding influence values are aggregated to form a cell weight for that cell. A contingency table, in which classes of the dependent variable are used as columns and categories of the predictor variable being studied are used as rows, is formed and cell weights are used in the analysis. Influence variables are ignored when using the QUEST growing method.

Frequency Variables, as defined in the Data>Weight Cases dialog, indicate the variable that specifies frequencies for cases, if any. Use this if records in your data set represent more than one unit - for example, if you are analyzing aggregated data. Values for a frequency variable should be positive integers. Cases with negative or zero frequency weights are excluded from the analysis. Non-integer frequency values are rounded to the nearest integer. Whereas AnswerTree has a "Frequency" box to specify a frequency weight variable, the TREE procedure uses the variable defined in the Data>Weight Cases dialog or the WEIGHT command of SPSS.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Decision Trees","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

61138

Document Information

Modified date:
16 April 2020

UID

swg21478254