Custom text processing

You can apply custom text processing algorithms to text by adding a custom annotator to Annotation Administration Console.

Annotation Administration Console supports UIMA, which is a framework for creating, discovering, composing, and deploying text analysis functions. Application developers create and test analysis algorithms for the text to be analyzed, and then create a processing engine archive (.pear file) that includes all of the resources required to use the archive. To be able to analyze text with your custom text analysis algorithms, you must add the .pear file to the system.

The analysis logic component in a text analysis engine is called an annotator. Each annotator performs specific linguistic analysis tasks. A text processing engine can contain any number of annotators, or it can be a composite of several text analysis engines, each of which contain their own custom annotators. The text analysis engine is included in the .pear file.

The information produced by the annotators is referred to as the analysis results. Analysis results are written to a data structure called a common analysis structure.

When you configure text processing options for a collection, you do the following tasks:

Select the text analysis engine that you want to use for annotating text when it is analyzed.
If the text to be analyzed includes XML documents with meaningful markup, and you want to use this markup in your custom text analysis, you can associate mapping files with the collection and map the output of the XML mapping files to a common analysis structure.
Map the text analysis results from a common analysis structure, so that the results can be returned to Watson Explorer by the text analytics API.
Map a common analysis structure to a relational database. You can map data to IBM® DB2® tables or Oracle tables. This type of mapping enables the analysis results to be used in database applications or, for example, in content mining applications.