Custom stop word dictionaries

You can increase search relevance by excluding frequently occurring terms, such as enterprise-specific vocabulary, from queries.

There are two kinds of support for stop words:

User-defined stop words typically include multiple word terms such as product names like WebSphere® Application Server. Multiple word terms that are contained in the stop word dictionary are correctly identified in user queries and do not have to appear between quotation marks.

Compound terms in Germanic languages are also correctly identified in queries. A compound term is the combination of two or more words that is used as a single word. Lexicalized compounds like Reisebüro (travel agency) are not considered to be compounds.

Compound terms in a query are broken up into the individual terms that make up the compound. If any of the individual terms that make up the compound are in the stop word dictionary, the compound term is not removed from the query.

For example, the query term Versicherungspolice (insurance policy) returns documents that contain the compound terms Lebensversicherungspolice (life insurance policy) and Haftpflichtversicherungspolice (third party insurance policy). Even if the word Police is listed in the stop word dictionary, the compound query term Versicherungspolice is not removed from the query.

You must list the enterprise-specific vocabulary in an XML file that you must then convert to a stop word dictionary so that it can be added to the system and associated with a collection.

You can select which stop word dictionary to use in the administration console. You can select one stop word dictionary for each collection. A stop word dictionary can be shared by several enterprise search and content analytics collections.