Creating an XML file for stop words

To remove frequently occurring terms from queries, such as enterprise-specific vocabulary, you must specify which words qualify as stop words in an XML file.

About this task

The XML file that lists the stop words must comply with a specific schema specified in the XML document. This is an example of an XML file for stop words:

<?xml version="1.0" encoding="UTF-8"?>
<stopWords xmlns="http://www.ibm.com/of/83/stopwordbuilder/xml">
  <stopWord>WebSphere Application Server</stopWord>
  <stopWord>WAS</stopWord>
  <stopWord>...</stopWord>
</stopWords>

A stop word can include white-space characters, but it cannot include punctuation characters, such as a comma (,) or vertical bar (|), because these characters might interfere with the query syntax.

You do not need to enumerate normalizations of the term, such as the removal of accents or umlauts (normalization is handled automatically). For example, if you want to include the term météo as a stop word, you do not need to include the term meteo, too.

When you create the dictionary from your XML file, you can specify the lc parameter to control whether upper and lower case variants of the term are to be ignored or respected. For example, if you create a case-insensitive dictionary and include the term météo, you do not need to include the term METEO, too.

Procedure

To create a list of stop words:

  1. Create an XML file. To avoid XML syntax errors, use an XML editor or XML authoring tool that can validate the XML. The XSD schema for the XML file is called stopWords.xsd in the ES_INSTALL_ROOT/configurations/parserservice/jediidata directory.
  2. Add a <stopWord> element for each word that is to be treated as a stop word.

    Be sure to include your mappings in a <stopWords xmlns="http://www.ibm.com/of/83/stopwordbuilder/xml"> element. The namespace (specified in the xmlns attribute) needs to be exactly as shown.

  3. Repeat the preceding step until you have specified all of the stop words that you want to be removed from queries when users search collections.
  4. Save and exit the XML file.

What to do next

After you create the XML file, you must convert it to a stop word dictionary so that the dictionary can be added to the system and associated with enterprise search and content analytics collections.