Creating and deploying a plug-in for exporting documents or deep inspection results

You can create a Java™ class to programmatically apply your own logic for exporting crawled, analyzed, or searched documents from collections. You can also create custom plug-in to export analysis results for each document that is included in a deep inspection request.

Before you begin

The plug-in must be compatible with Java 6.

About this task

For name resolution, use the ES_INSTALL_ROOT/lib/es.indexservice.jar JAR file.

Procedure

To create a Java class and deploy a plug-in for exporting documents or deep inspection results:

  1. Create a Java class that extends the com.ibm.es.oze.api.export.ExportDocumentPublisher abstract class. The com.ibm.es.oze.api.export.ExportDocumentPublisher class has the following methods:
    • init()
    • initPublish()
    • publish()
    • termPublish()
    • term()

    The init, initPublish, termPublish, and term methods are implemented to do nothing. The publish method is an abstract method, so you must implement it.

    If you plan to export content from an InfoSphere® BigInsights collection and export directly from Hadoop MapReduce tasks, the plug-in class must have the annotation com.ibm.es.oze.api.export.ExecuteOnHadoop. The plug-in can override the abortPublish method that cleans up the output of an aborted Hadoop task. The abortPublish method is called when a Hadoop task is aborted and it calls the termPublish method by default.

  2. Optional: If you want to control which documents are exported, extend the com.ibm.es.oze.api.export.ExportDocumentFilter abstract class. The class has the following method:
    • accept()
  3. Optional: If you want to export deep inspection results, implement the following interfaces:
    interface: com.ibm.es.oze.api.export.document. InspectionContent
    Use this interface to export metadata about the deep inspection request.
    package com.ibm.es.oze.api.export.document;
    public interface InspectionContent extends Content {
       public InspectionRecord[] getInspectionRecords();
    }
    interface: com.ibm.es.oze.api.export.document.InspectionRecord
    Use this interface to export analysis results for each document that is included in a deep inspection request.
    package com.ibm.es.oze.api.export.document;
    public interface InspectionRecord {
       public double getIndex();
       public String[] getFacetNames();
       public int getCount();
    }
  4. Compile the implemented code and create a JAR file for it. To deploy the plug-in, you must provide the plug-in as a JAR file. Add the ES_INSTALL_ROOT/lib/es.indexservice.jar file to the class path when you compile.

    If you plan to export content from an InfoSphere BigInsights collection and export directly from Hadoop MapReduce tasks, all required resources for the plug-in, such as classes and resource files, must be included in JAR files. All JAR files must be explicitly listed in the class path.

  5. To integrate the custom plug-in for exporting documents, configure export options for a collection in the administration console and specify the class path of the JAR files, the class name, and the properties that you want to pass to the plug-in. If no filter class is specified, all documents are exported.

    To integrate the custom plug-in for exporting deep inspection results, configure text analytics options for a collection in the administration console and specify class path of the JAR files, the class name, and the properties that you want to pass to the plug-in.