How can I parse and index video, audio and image files with IBM Content Analytics with Enterprise Search?
Resolving the problem
While the content of the video files cannot be parsed and indexed, nor the metadata inside a video file, the metadata of the video file (surrounding metadata) can be extracted and displayed in search results. This metadata can also be included in the dynamic summary if each of the fields is configured to participate in the dynamic summary.
Here are the steps to follow to crawl, parse and index the multimedia:
1. Ensure that the video file types are not in the forbidden list of files to be crawled.
2. Navigate to ES_NODE_ROOT\master_config\col_id.indexservice
3. Open parser_config.xml
3. Navigate to the section <ParserName>terminator</ParserName>
4. Under that find
5. Move these video mimetypes from terminator to empty section
This will remove the content but extract the metadata of the video file. Please note it will not extract the metadata inside the video file.
6. Save and close parser_config.xml
7. Restart the parser and indexer
8. Restart the crawler and do a full recrawl
This will crawl, parse and index the video files. Note the content is not ingested into ICA. But when search is done and user clicks on the search result they will be able to access the content via the URL.