You can create a Java™ class
to programmatically apply your own logic for exporting crawled, analyzed,
or searched documents from collections. You can also create custom
plug-in to export analysis results for each document that is included
in a deep inspection request.
Before you begin
The plug-in must be compatible with Java 6.
About this task
For name resolution, use the ES_INSTALL_ROOT/lib/es.indexservice.jar JAR
file.
Procedure
To create a Java class
and deploy a plug-in for exporting documents or deep inspection results:
- Create a Java class that extends the
com.ibm.es.oze.api.export.ExportDocumentPublisher abstract class.
The com.ibm.es.oze.api.export.ExportDocumentPublisher class has the
following methods:
- init()
- initPublish()
- publish()
- termPublish()
- term()
The init, initPublish, termPublish, and term methods are implemented to do nothing. The
publish method is an abstract method, so you must implement it.
If you plan to export content from an InfoSphere® BigInsights collection and export directly from Hadoop
MapReduce tasks, the plug-in class must have the annotation
com.ibm.es.oze.api.export.ExecuteOnHadoop. The plug-in can override the abortPublish
method that cleans up the output of an aborted Hadoop task. The abortPublish method is
called when a Hadoop task is aborted and it calls the termPublish method by default.
- Optional: If you want to control which documents are exported, extend the
com.ibm.es.oze.api.export.ExportDocumentFilter abstract class. The class has the
following method:
- Optional: If you want to export deep inspection
results, implement the following interfaces:
- interface: com.ibm.es.oze.api.export.document. InspectionContent
- Use this interface to export metadata about the deep inspection
request.
package com.ibm.es.oze.api.export.document;
public interface InspectionContent extends Content {
public InspectionRecord[] getInspectionRecords();
}
- interface: com.ibm.es.oze.api.export.document.InspectionRecord
- Use this interface to export analysis results for each document
that is included in a deep inspection request.
package com.ibm.es.oze.api.export.document;
public interface InspectionRecord {
public double getIndex();
public String[] getFacetNames();
public int getCount();
}
- Compile the implemented code and create a JAR file for it. To deploy the plug-in, you must provide the plug-in as a JAR file. Add the
ES_INSTALL_ROOT/lib/es.indexservice.jar file to the class path when
you compile.
If you plan to export content from an InfoSphere BigInsights collection and export directly from Hadoop
MapReduce tasks, all required resources for the plug-in, such as classes and resource
files, must be included in JAR files. All JAR files must be explicitly listed in the
class path.
- To integrate the custom plug-in for exporting documents,
configure export options for a collection in the administration console
and specify the class path of the JAR files, the class name, and the
properties that you want to pass to the plug-in. If no filter class
is specified, all documents are exported.
To integrate
the custom plug-in for exporting deep inspection results, configure
text analytics options for a collection in the administration console
and specify class path of the JAR files, the class name, and the properties
that you want to pass to the plug-in.