IBM Support

Indexing .dotx and .xltx files with IBM Content Analytics with Enterprise Search.

Technote (FAQ)


Question

How do I crawl and index .dotx and .xltx files with IBM Content Analytics with Enterprise Search?

Answer

(1) Create the following file

<$ES_NODE_ROOT>/master_config/<cid>.stellent/stellenttypes.cfg

In the file add the lines below:

accept DEFAULT
accept FI_WINWORDTEMPLATE2007 dotx
accept FI_WINWORDTEMPLATE2010 dotx
accept FI_EXCELTEMPLATE2007 xltx
accept FI_EXCELTEMPLATE2010 xltx

(2) Modify <$ES_NODE_ROOT>/master_config/<cid>.indexservice/mimetypes.xml and add the lines below to the <ExtensionMapping> section:

<MappedMimetype Name="application/vnd.openxmlformats-officedocument.wordprocessingml.template">
<Extension>.dotx</Extension>
</MappedMimetype>
<MappedMimetype Name="application/vnd.openxmlformats-officedocument.spreadsheetml.template">
<Extension>.xltx</Extension>
</MappedMimetype>


(3) Modify <$ES_NODE_ROOT>/master_config/<collection ID>.indexservice/parser_config.xml and add the lines below to the section under <ParserMapping><ParserName>stellent</ParserName>:

<Mimetype>application/vnd.openxmlformats-officedocument.wordprocessingml.template</Mimetype>
<Mimetype>application/vnd.openxmlformats-officedocument.spreadsheetml.template</Mimetype>


After the above changes restart the crawler and parser/indexer from the admin console to crawl and index the target files.

Related information

Associating document types with the text extractor

Document information

More support for: Watson Content Analytics

Software version: 3.0

Operating system(s): AIX, Linux, Windows

Reference #: 1643106

Modified date: 2014-04-08