IBM Content Analytics with Enterprise Search does not index header information from Microsoft PowerPoint

Technote (troubleshooting)


Problem(Abstract)

In Microsoft PowerPoint, notes can be made for each slide. These notes can have a header and footer like a word document.

These header and footer texts are not getting extracted by IBM Content Analytics with Enterprise Search (ICAwES) text extraction.

In addition, Microsoft PowerPoint has a slide master function where text can be stored too. The ICAwES 3.0 FixPack 1 has a fix where text extraction will only extract the slide master footer text and not text in the header.

Cause

Slide notes are ignored by default, but there is a work-around possible.

Resolving the problem

Here are the steps for a work-around to extract this information:

  1. Make directory '$ES_INSTALL_ROOT/lib/com/ibm/es/oze/parser/outsidein/'
  2. Put the attached tag_actions.properties file under the created directory
  3. Modify the classpath parameter in '$ES_INSTALL_ROOT/configurations/interfaces/stellent__interface.ini' file to include 'lib' directory
    • classpath=lib,es.indexservice.jar
  4. Restart the Parse and Index, then re-crawl/re-parse/re-index the documents

Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Content Analytics with Enterprise Search

Software version:

3.0

Operating system(s):

AIX, Linux, Linux on System z, Windows

Reference #:

1623435

Modified date:

2013-09-18

Translate my page

Machine Translation

Content navigation