The default parsing rules do not index the content of alt attributes
in HTML elements, such as <img alt="content"/>.
To ensure that the alt text is indexed, you must specify rules in
a parser configuration file.
Procedure
To configure the parser to add content from alt text
to the index:
- Log in as the default Watson Content Analytics administrator on the
master server and edit the ES_NODE_ROOT/master_config/collection_ID.indexservice/parser_config.xml file,
where collection_ID identifies
the collection that you want to customize.
- Add the following rule to the file and save your changes:
<Rule>
<Conditions>
<Element Name="img" />
</Conditions>
<Actions>
<AttributeAction FieldName="alttext" FieldNameType="fixed" ValueFrom="alt" />
</Actions>
</Rule>
- In the administration console, create an index field named alttext.
Enable the full-text searchable attribute for the index field.
- Restart the parse and index services for the collection.
- Recrawl or re-import documents so that they can be parsed
and indexed again.