# When Decision Tree Node Mean and Predicted Value Disagree, Effect of Influence Variables (Case Weights)

## Question

When I build a Decision Tree with a continuous target variable in SPSS Statistics and examine the node statistics, I notice that the 'Predicted value' for each node is almost always equal to the node's mean. However, this is not true when influence variables (case weights in Modeler and AnswerTree) have been used in the tree. How do influence variables affect the calculation of the 'Predicted value'?

If an influence variable has been designated, then its values (case weights) are employed in the calculation of each node's 'Predicted' value, as the 'Predicted' value is model-based. The case weights reflect the influence that cases should have in the model. In contrast, frequency weights are employed in calculation of both the mean and 'Predicted' value. The frequency weights capture any aggregation in the data, where one case may represent multiple respondents, and must be included for an accurate calculation of the observed mean.
The Mean for node j is:
Mean(j) = Sum{i=1 to Nj} (Yij*Fij) / Sum{i=1 to Nj} (Fij)

whereas the Predicted value is:
Pred(j) = Sum{i=1 to Nj} (Yij*Fij*Cij) / Sum{i=1 to Nj} (Fij*Cij)

where Yij is the target variable value for case i in node j; Fij is the frequency weight (or 1, if no frequency variable is designated); Cij is the case weight (or 1, if no case weight variable is designated).

(0 users)Average rating

## Document information

SPSS Statistics
Decision Trees

Not Applicable

### Operating system(s):

Platform Independent

1681824

2014-08-14

## Translate my page

Machine Translation