I'm running the K-Nearest Neighbor (KNN) procedure in SPSS Statistics with automatic selection of the number of neighbors (K). The sums of squared error values in the K Selection Error Log are about an order of magnitude smaller than the value shown in the Error Summary table. Why is this the case?
The selection of K is based on V-fold cross-validation, with a default V of 10 folds. The Sum of Squares Error values shown in the K Selection Error Log are average errors over the V folds for each value of K, while the value shown for the Training Partition in the Error Summary table is for the entire set of cases in the Training partition. With the default V of 10, the sums of squared errors in the K Selection Error Log are thus based on about a tenth as many cases as the value for the Error Summary table. They can be translated to the scale of Error Summary table by multiplying them by V (10 in the case of the default V).
Rate this page:
Copyright and trademark information
IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.