IBM Content Analytics with Enterprise Search does not index header information from Microsoft PowerPoint

Technote (troubleshooting)


In Microsoft PowerPoint, notes can be made for each slide. These notes can have a header and footer like a word document.

These header and footer texts are not getting extracted by IBM Content Analytics with Enterprise Search (ICAwES) text extraction.

In addition, Microsoft PowerPoint has a slide master function where text can be stored too. The ICAwES 3.0 FixPack 1 has a fix where text extraction will only extract the slide master footer text and not text in the header.


Slide notes are ignored by default, but there is a work-around possible.

Resolving the problem

Here are the steps for a work-around to extract this information:

  1. Make directory '$ES_INSTALL_ROOT/lib/com/ibm/es/oze/parser/outsidein/'
  2. Put the attached file under the created directory
  3. Modify the classpath parameter in '$ES_INSTALL_ROOT/configurations/interfaces/stellent__interface.ini' file to include 'lib' directory
    • classpath=lib,es.indexservice.jar
  4. Restart the Parse and Index, then re-crawl/re-parse/re-index the documents

Document information

More support for: Watson Content Analytics

Software version: 3.0

Operating system(s): AIX, Linux, Windows

Reference #: 1623435

Modified date: 04 April 2014

