You can create a crawler plug-in that enables users to view documents that are extracted from archive files, such as .zip, .tar, or .rar files.
Watson Content Analytics provides Java™ APIs for implementing a crawler plug-in that extracts archive entries from archive files that are crawled by type A data source crawlers. The fetch capabilities, however, do not allow users to view the extracted files. You can extend the archive plug-in so that users can fetch and view documents that are extracted from archive files. To implement the plug-in, you use the same implementation that you use for other type A data source crawler plug-ins.
es.ext.dirs.type=classpath
archive.plugin.type=classname;.extension
where:# extension files and directories
es.ext.dirs=C:\\Program Files\\IBM\\es\\lib\\es.repo.jar;C:\\Program
Files\\IBM\\es
\\lib\\rdsutil.jar;C:\\Program Files\\IBM\\es\\lib\\ESSearchServer.jar;C:\\Program
Files\\IBM\\es
\\lib\\trevi.tokenizer.jar;C:\\Program Files\\IBM\\es\\lib\\es.workmgr.jar;
C:\\Program Files\\IBM\\es\\lib\\dscrawler.jar;
es.ext.dirs.rar=C:\\rarplugin;C:\\rarplugin\rarplugin.jar;
archive.plugin.rar=RarFile;.rar