Skip to main content

KNN Error Sums of Squares are very different in K Selection Error Log and Error Summary


Technote (FAQ)


Question

I'm running the K-Nearest Neighbor (KNN) procedure in SPSS Statistics with automatic selection of the number of neighbors (K). The sums of squared error values in the K Selection Error Log are about an order of magnitude smaller than the value shown in the Error Summary table. Why is this the case?

Answer

The selection of K is based on V-fold cross-validation, with a default V of 10 folds. The Sum of Squares Error values shown in the K Selection Error Log are average errors over the V folds for each value of K, while the value shown for the Training Partition in the Error Summary table is for the entire set of cases in the Training partition. With the default V of 10, the sums of squared errors in the K Selection Error Log are thus based on about a tenth as many cases as the value for the Error Summary table. They can be translated to the scale of Error Summary table by multiplying them by V (10 in the case of the default V).

Rate this page:

(0 users)Average rating

Copyright and trademark information

IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Rate this page:


(0 users)Average rating

Add comments

Document information

SPSS Statistics


Software version:
Not Applicable


Operating system(s):
Platform Independent


Reference #:
1619444


Modified date:
2012-12-04

Translate my page

Content navigation