Predictors with many categories omitted from Kohonen and K-Means models
I have been using the Kohonen and K-Means nodes in Modeler to cluster cases. Some of the predictors are being omitted from the solution in both the Kohonen and the K-Means nuggets, i.e.,
these predictors are not listed in the set of inputs in the Summary panel or in the Clusters table
in the Model panel of either nugget.The omitted predictors are a mix of nominal and ordinal fields with
set sizes (i.e., number of categories) ranging from 25 to 50. Some of these predictors are string variables and some are numeric.
Why were these predictors omitted from the Kohonen and K-Means analyses? Is there a limit to the number of categories in a Kohonen or K-Means input field?
There is a default set size limit, i.e., a limit on the number of categories in a set variable, for Kohonen and K-Means analysis in Modeler. This limit can be changed by the user.
In Modeler, open the File menu and click Stream Properties.
In the Stream Properties dialog, click the Options tab (if Options is not already the forward panel).
In the Options dialog, choose General from the list on the left side of the dialog.
There is a check box about midway down the dialog, titled "Limit set size for Kohonen and K-Means Modeling". The default limit there is 20. Use the spinner to set this limit to a value that will accommodate each of your categorical predictors. There is also a "Save as Default" button so you can increase the limit automatically for future streams that you build. The predictors that were omitted should now appear in the Summary list of predictors and in the Clusters table.
Note that you can also access the Stream Properties->Options dialog from the Tools menu in Modeler.