Extending the archive plug-in to view extracted files

You can create a crawler plug-in that enables users to view documents that are extracted from archive files, such as .zip, .tar, or .rar files.

Watson Content Analytics provides Java™ APIs for implementing a crawler plug-in that extracts archive entries from archive files that are crawled by type A data source crawlers. The fetch capabilities, however, do not allow users to view the extracted files. You can extend the archive plug-in so that users can fetch and view documents that are extracted from archive files. To implement the plug-in, you use the same implementation that you use for other type A data source crawler plug-ins.

Restriction: You cannot use this plug-in with the following type B data source crawlers:

Agent for Windows file systems crawler
BoardReader crawler
Case Manager crawler
Exchange Server crawler
FileNet P8 crawler
SharePoint crawler

To register the plug-in, update the customcommunication.properties file and add the following properties:

es.ext.dirs.type=classpath
archive.plugin.type=classname;.extension

where:

type: Specifies the identifier of the archive document type, such as .rar or .lzh. You can also choose your own type.
classpath: Specifies the list of paths for the class path that is required to run your archive plug-in. Separate the paths by a semicolon (;) on Windows or a colon (:) on AIX® or Linux.
classname: Specifies the class name of your archive plug-in.
extension: Specifies the file extension. Your archive plug-in is invoked for the files that match this extension.

The following example shows a sample customcommunication.properties file that registers an archive plug-in named RarFile to view documents extracted from .rar files:

# extension files and directories
es.ext.dirs=C:\\Program Files\\IBM\\es\\lib\\es.repo.jar;C:\\Program 
Files\\IBM\\es
\\lib\\rdsutil.jar;C:\\Program Files\\IBM\\es\\lib\\ESSearchServer.jar;C:\\Program
Files\\IBM\\es
\\lib\\trevi.tokenizer.jar;C:\\Program Files\\IBM\\es\\lib\\es.workmgr.jar;
C:\\Program Files\\IBM\\es\\lib\\dscrawler.jar;

es.ext.dirs.rar=C:\\rarplugin;C:\\rarplugin\rarplugin.jar;
archive.plugin.rar=RarFile;.rar