IBM Support

When Decision Tree Node Mean and Predicted Value Disagree, Effect of Influence Variables (Case Weights)

Technote (FAQ)


When I build a Decision Tree with a continuous target variable in SPSS Statistics and examine the node statistics, I notice that the 'Predicted value' for each node is almost always equal to the node's mean. However, this is not true when influence variables (case weights in Modeler and AnswerTree) have been used in the tree. How do influence variables affect the calculation of the 'Predicted value'?


If an influence variable has been designated, then its values (case weights) are employed in the calculation of each node's 'Predicted' value, as the 'Predicted' value is model-based. The case weights reflect the influence that cases should have in the model. In contrast, frequency weights are employed in calculation of both the mean and 'Predicted' value. The frequency weights capture any aggregation in the data, where one case may represent multiple respondents, and must be included for an accurate calculation of the observed mean.
The Mean for node j is:
Mean(j) = Sum{i=1 to Nj} (Yij*Fij) / Sum{i=1 to Nj} (Fij)

whereas the Predicted value is:
Pred(j) = Sum{i=1 to Nj} (Yij*Fij*Cij) / Sum{i=1 to Nj} (Fij*Cij)

where Yij is the target variable value for case i in node j; Fij is the frequency weight (or 1, if no frequency variable is designated); Cij is the case weight (or 1, if no case weight variable is designated).

Related information

Need more help? Our Statistics forum is Live!

Document information

More support for: SPSS Statistics
Decision Trees

Software version: Not Applicable

Operating system(s): Platform Independent

Reference #: 1681824

Modified date: 07 September 2016