Retrieving targeted XML elements

You can specify that a returned document must be accompanied by a result field.

About this task

In the opaque term that specifies the semantic search, you can prepend a pound or hash sign (#) to one XML element (or annotation) in the xmlf2 query term. This result field enumerates all the occurrences of an Unstructured Information Management Architecture (UIMA) annotation that is designated in the XML query term. These enumerated annotation occurrences are within the returned document, and each of them makes a part of an occurrence of the whole XML query term in the document.

The XML element is designated as the targeted XML element whose occurrences are to be enumerated. When the semantic search is expressed by XPath, then by definition of XPath, the deepest element that is not inside the bracketed phrase [..] and not inside a predicate is the target element.

For example, the query <book language=en> <#author> </#author> </book>, or the equivalent query <book language=en> <#author/> </book>, returns documents that include at least one occurrence of the annotation book that has the attribute language=en and includes within its span an occurrence of the annotation author. The query also returns the enumeration of all the occurrences of the tag <author> that appear within the occurrence of the tag <book> that has the attribute language=en.

Each occurrence is enumerated by its unique ID. The UIMA annotators assign a unique ID to each annotation that they generate. XML elements that are part of the raw document rather than annotations that are generated by UIMA annotators do not have unique IDs, and they are not enumerated in that result field. If the summary field of the retrieved document includes text that is covered in the document by an enumerated occurrence, that text is highlighted.

The following occurrences of the tag <author> in the retrieved document will not be enumerated:
  • An occurrence of the tag <author> within the span of the tag <journal>
  • An occurrence of the tag <author> within the span of the tag <book> that has the attribute language=ge
  • An occurrence of the tag <author> within the span of the tag <book> that does not have the attribute language
  • An occurrence of the tag <author> that is part of an XML document, that is the tag <author> is part of the raw document rather than a generated annotation

The enterprise search application can access the enumeration of the occurrences of the target element through the TargetElement property of the Result object, for example, Result.getProperty("TargetElement"). The returned value of that property is a string of integers that are separated by spaces. Each integer is an ID of a single occurrence of the target element.

The actual target elements that correspond to these integer values cannot be retrieved by the API. If an application must access those elements, it must create its own mapping table during parsing. For example, you can create a common analysis for relational database mapping.