Parsing and indexing video, audio and image files with IBM Content Analytics with Enterprise Search

Technote (troubleshooting)


Problem(Abstract)

How can I parse and index video, audio and image files with IBM Content Analytics with Enterprise Search?

Resolving the problem

While the content of the video files cannot be parsed and indexed, nor the metadata inside a video file, the metadata of the video file (surrounding metadata) can be extracted and displayed in search results. This metadata can also be included in the dynamic summary if each of the fields is configured to participate in the dynamic summary.

Here are the steps to follow to crawl, parse and index the multimedia:

1. Ensure that the video file types are not in the forbidden list of files to be crawled.
2. Navigate to ES_NODE_ROOT\master_config\col_id.indexservice
3. Open parser_config.xml
3. Navigate to the section <ParserName>terminator</ParserName>
4. Under that find
<Mimetype>video/mpeg</Mimetype>
<Mimetype>video/quicktime</Mimetype>
<Mimetype>video/x-msvideo</Mimetype>
5. Move these video mimetypes from terminator to empty section
<ParserName>empty</ParserName>

This will remove the content but extract the metadata of the video file. Please note it will not extract the metadata inside the video file.

6. Save and close parser_config.xml
7. Restart the parser and indexer
8. Restart the crawler and do a full recrawl

This will crawl, parse and index the video files. Note the content is not ingested into ICA. But when search is done and user clicks on the search result they will be able to access the content via the URL.


Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Watson Content Analytics

Software version:

3.0

Operating system(s):

AIX, Linux, Windows

Reference #:

1634711

Modified date:

2013-05-02

Translate my page

Machine Translation

Content navigation