Creating a stop word dictionary

After you create or update a list of user-defined stop words in an XML file, you must convert the XML file to a stop word dictionary.

About this task

To create a stop word dictionary, use the command line tool called esstopworddictbuilder, which is provided with Watson Content Analytics. The tool is in the ES_INSTALL_ROOT/bin directory.

The input to the tool is the XML file that lists the stop words, and the output from the tool is a case-sensitive stop word dictionary. The dictionary must have the suffix DIC. For example, c:\mydictionaries\productstopwords.dic.

The default location for both files is the directory where the script is invoked. If a dictionary with the same name exists, the script produces an error.

The maximum size of a DIC file is 8 MB.

Procedure

To create a stop word dictionary:

  1. On the master server, log in as the Watson Content Analytics default administrator.
  2. Enter the following command, where XML_file is the fully qualified path to the XML file that contains the list of stop words and DIC_file is the fully qualified path to the stop word dictionary.
    AIX® or Linux
    esstopworddictbuilder.sh XML_file DIC_file
    Windows
    esstopworddictbuilder.bat XML_file DIC_file

What to do next

After you create a stop word dictionary, use the administration console to add the dictionary to the system and associate it with one or more enterprise search or content analytics collections.

Only the generated DIC file is uploaded to the system. Ensure that the source XML file is kept in an access-controlled environment, and ensure that you back up the file regularly. You need this XML file to update your stop word dictionary.