I'm running the K-Nearest Neighbor (KNN) procedure in SPSS Statistics with automatic selection of the number of neighbors (K). The sums of squared error values in the K Selection Error Log are about an order of magnitude smaller than the value shown in the Error Summary table. Why is this the case?
The selection of K is based on V-fold cross-validation, with a default V of 10 folds. The Sum of Squares Error values shown in the K Selection Error Log are average errors over the V folds for each value of K, while the value shown for the Training Partition in the Error Summary table is for the entire set of cases in the Training partition. With the default V of 10, the sums of squared errors in the K Selection Error Log are thus based on about a tenth as many cases as the value for the Error Summary table. They can be translated to the scale of Error Summary table by multiplying them by V (10 in the case of the default V).