You can improve the quality and precision of search results by integrating custom text processing algorithms with collections.
Watson Explorer Content Analytics supports the Apache Unstructured Information Management Architecture (UIMA), which is a framework for creating, discovering, composing, and deploying text analysis functions. Application developers create and test analysis algorithms for the content to be searched, then create a processing engine archive (.pear file) that includes all of the resources required to use the archive. To be able to query collections with your custom analysis algorithms, you must add the archive (which contains the text analysis engine) to the system.
In addition to the system text analysis engine, a collection that is based on a solution package can be associated with other text analysis engines, known as solution text analysis engines, that are provided in the solution package or are installed in the collection by exporting a UIMA pipeline from Content Analytics Studio.
The analysis logic component in a text analysis engine is called an annotator. Each annotator performs specific linguistic analysis tasks. A text processing engine can contain any number of annotators, or it can be a composite of several text analysis engines, each of which contain their own custom annotators.
The information produced by the annotators is referred to as the analysis results. Analysis results, which correspond to the information that you want to search for, are written to a data structure called a common analysis structure.
For example, you can map the content of <addressee> and <customer> elements to Person annotations in the common analysis structure. These annotations can then be accessed by your custom annotators, which might detect additional information (for example, they might detect the gender of the Person). You can also map Person annotations to the index, which allows users to search for Persons without having to know the original names of the XML elements.
If you want to allow users to specify the original XML elements in queries, then you do not need to define any XML mappings. Instead, you can configure parsing options and enable native XML search for the collection.
For example, depending on the entities and relationships that are detected by the annotators, users can search for concepts that occur in the same sentence (such as a specific person and any competitor name), or a keyword and a concept (such the name Alex and a phone number).