Indexing image alt text in HTML documents

The default parsing rules do not index the content of alt attributes in HTML elements, such as <img alt="content"/>. To ensure that the alt text is indexed, you must specify rules in a parser configuration file.

Procedure

To configure the parser to add content from alt text to the index:

  1. Log in as the default Watson Content Analytics administrator on the master server and edit the ES_NODE_ROOT/master_config/collection_ID.indexservice/parser_config.xml file, where collection_ID identifies the collection that you want to customize.
  2. Add the following rule to the file and save your changes:
    <Rule>
      <Conditions>
       <Element Name="img" />
      </Conditions>
      <Actions>
       <AttributeAction FieldName="alttext" FieldNameType="fixed" ValueFrom="alt" />
      </Actions>
    </Rule>
  3. In the administration console, create an index field named alttext. Enable the full-text searchable attribute for the index field.
  4. Restart the parse and index services for the collection.
  5. Recrawl or re-import documents so that they can be parsed and indexed again.