Custom text processing

You can improve the quality and precision of search results by integrating custom text processing algorithms with collections.

Watson Explorer Content Analytics supports the Apache Unstructured Information Management Architecture (UIMA), which is a framework for creating, discovering, composing, and deploying text analysis functions. Application developers create and test analysis algorithms for the content to be searched, then create a processing engine archive (.pear file) that includes all of the resources required to use the archive. To be able to query collections with your custom analysis algorithms, you must add the archive (which contains the text analysis engine) to the system.

In addition to the system text analysis engine, a collection that is based on a solution package can be associated with other text analysis engines, known as solution text analysis engines, that are provided in the solution package or are installed in the collection by exporting a UIMA pipeline from Content Analytics Studio.

The analysis logic component in a text analysis engine is called an annotator. Each annotator performs specific linguistic analysis tasks. A text processing engine can contain any number of annotators, or it can be a composite of several text analysis engines, each of which contain their own custom annotators.

The information produced by the annotators is referred to as the analysis results. Analysis results, which correspond to the information that you want to search for, are written to a data structure called a common analysis structure.

When you configure text processing options for a collection, you do the following tasks: