Creating Custom Annotations with Content Analytics

Custom annotations can be added to Watson Explorer Engine indexed documents using the IBM Content Analytics Annotation converter.

About this task

Custom annotations can be downloaded from the IBM website, or created created by using the Unstructured Information Management Architecture (UIMA) framework or IBM® Watson Explorer Content Analytics Studio. Content Analytics Studio is a separately installable component that is provided with IBM Watson Explorer Advanced Edition. The following procedure describes how to add and configure the converter for custom annotations:

Procedure

  1. Install and start the Annotation Administration Console as described in Installing Annotation Administration Console.
  2. Create a collection in the Annotation Administration Console as described in Creating a collection, and configure it with the appropriate annotators as described in Custom text processing. You must create a separate Annotation Administration Console collection for each set of annotators that you want to apply to the content.
  3. In the Watson Explorer Engine administration tool, add the IBM Content Analytics Annotation connector into the converter list for the collection you want to annotate and configure the following options:
    • Annotation analysis URL - URL to the Annotation Administration Console. Both hostname and port are required. The default port is 8393
    • Annotation Collection ID - ID of the annotation collection that you configured in the Annotation Administration Console
    • Annotation Type - Set this option to Custom Annotations
    • User Name - The user name used to connect to the Annotation Administration Console
    • Password - The password used to connect to the Annotation Administration Console
    • Exclude Contents By Default - When enabled, the Content List field defines which Watson Explorer Engine input contents will be annotated. When disabled (the default), the Content List field defines which Watson Explorer Engine input contents will not be annotated
    • Content List - This list of Watson Explorer Engine contents that will or will not (based on the configuration of the Exclude Contents By Default field) be annotated
    • Logging Configuration - Log4j configuration for the converter. Default configuration enables OFF level logging
  4. At this point you must define a custom converter to take the custom annotations that are returned from the Annotation Administration Console and parse them into content that is usable by Watson Explorer Engine. By default, the XML returned by the Annotation Administration Console is of the following format:
    <metadata>
     <facets>
      <facet>
       <path>
        <keyword></keyword>
       </path>
      </facet>
     </facets>
    </metadata>

    You can use this content to create whatever <content> nodes are appropriate for your collection. To add a custom converter:

    1. Click the Add a new converter link in the Converting subtab of your collection's Configuration options.
    2. Select Custom converter from the list and click Add
    3. Set the Type-In and Type-Out options to application/vxml-unnormalized, and set the Action option to XSL
    4. Enter appropriate XSL to convert the returned annotations into Watson Explorer Engine document contents.

      For example, if you want to create contents from the annotations that use the calculated path value as the content name, and the keyword node content as the content value, use the following XSL:

      <xsl:template match="/">
        <vce>
          <xsl:for-each select="//document">
            <document>
              <xsl:for-each select="./content">
                <xsl:text disable-output-escaping="yes"><![CDATA[<content name="]]></xsl:text>
                <xsl:value-of select="@name"/>
                <xsl:text disable-output-escaping="yes"><![CDATA[">]]></xsl:text>
                <xsl:value-of select="."/>
                <xsl:text disable-output-escaping="yes"><![CDATA[</content>]]></xsl:text>
              </xsl:for-each>
              <xsl:for-each select="./Metadata/Facets/Facet">
                <!-- combine value of Path nodes to form content name -->
                <xsl:text disable-output-escaping="yes"><![CDATA[<content name="]]></xsl:text>
                <xsl:value-of select="./Path[1]" />/
                <xsl:value-of select="./Path[2]" />/
                <xsl:value-of select="./Path[3]" />
                <xsl:text disable-output-escaping="yes">"></xsl:text>
                <!-- select value of Keyword node as content value -->
                <xsl:value-of select="./Keyword" />
                <xsl:text disable-output-escaping="yes"><![CDATA[</content>]]></xsl:text>
              </xsl:for-each>
            </document>
          </xsl:for-each>
        </vce>
      </xsl:template>

      In most cases, the XSL listed above will be valid for your application.

    5. Click OK to save the converter and re-order the converter to be below the IBM Content Analytics Annotation converter that you added in Step 3 by clicking on the number to the left of the converter name and dragging it to the new position.
  5. Configure any other appropriate search collection options and start indexing the collection.