API overview

Watson Content Analytics provides several sets of application programming interfaces (APIs) so that you can create search and administration applications, modify crawled documents, filter search results, export documents, set up an identity management component to enforce document-level security, and perform ad-hoc text analysis on documents. .

For information about how to use the Watson Content Analytics APIs, see the examples in the ES_INSTALL_ROOT/samples directory.

REST APIs

Use the REST APIs to create search, content mining, and administration applications. The search REST API is available on Watson Content Analytics search servers and is deployed on the search application port, which by default is port 8393 if you use the embedded web application server. If you use WebSphere Application Server, the default port is 9081 or 80 if IBM HTTP Server is configured. The administrative REST API is available on the master server if you use the embedded web application server and uses the same port number as the administrative console, which by default is 8390. If you use WebSphere Application Server, the administrative REST API is available on the search application port, which by default is 9081 or 80 if IBM HTTP Server is configured. You can change these port numbers when you install Watson Content Analytics.

For more information about using the REST APIs, see the API documentation in the ES_INSTALL_ROOT/docs/api/rest directory. Sample scenarios that demonstrate how to perform administrative and search tasks are available in the ES_INSTALL_ROOT/samples/rest directory.

IBM search and index APIs

You can use the search and index application programming interfaces to create custom enterprise search applications. The Watson Content Analytics implementation of the search and index API (SIAPI) allows the search server to be accessed remotely.

Restriction: The SIAPI administration APIs are deprecated and are no longer supported. The SIAPI search APIs are being deprecated and will not be supported in future releases. Use the REST APIs instead of the SIAPI APIs to create custom applications.
You can use applications that are provided with Watson Content Analytics as a base from which to develop your custom applications.
search
This application shows you how to do basic search and retrieval tasks, such as selecting collections for search, querying those collections, configuring the display of search results, and narrowing results through faceted browsing.
analytics
This application shows you how to use content mining capabilities to explore different facets of content analytics collections. For example, you can see how frequencies of facet values change over time and analyze deviations and trends in the data.
Important: If you customize a provided SIAPI application, you must rename it to ensure that your changes are not overwritten when you install a fix pack or upgrade to a new version of Watson Content Analytics.

Plug-in APIs

Plug-in APIs allow you to customize the Watson Content Analytics system in the following ways:
  • Use the crawler plug-ins to modify documents after they are crawled, but before they are parsed and indexed for search. You can add, change, or delete information in the document or the document metadata. You can also indicate that the document is to be ignored (skipped) and not indexed.
  • Use the post-filtering plug-in to apply your own security logic for post-filtering search results.
  • Use the export plug-in to apply your own logic for exporting crawled, analyzed, or searched documents and the output from deep inspection requests.

Identity management component APIs

Access to sensitive information that is contained in multiple repositories is typically controlled and enforced by the managing software. You identify yourself to the host system with a user ID and password. After the system authenticates your user ID and password, the managing software controls which documents you are allowed to see based on your access rights. Unless a single sign-on policy is implemented, you must have several different user IDs and passwords for each repository.

Watson Content Analytics provides an identity management component that enables users to search multiple repositories with a single query and see only the documents that they are allowed to see. You can build this component into your applications so that users can sign on with only one user ID and password when searching secure collections.

See the Javadoc documentation for details about the APIs that can be used to create your own identity management component or customize the provided solution.

Real time natural language processing (NLP) API

Use this API to perform ad-hoc text analytics on documents without adding the documents to the index. Both SIAPI and REST API versions of the real-time NLP API are provided. The NLP REST API accepts both text and binary content, but the SIAPI version only accepts content in text format.
Restriction: The SIAPI version of the real-time NLP API is being deprecated and will not be supported in future releases. Use the REST API version instead of the SIAPI version to create custom applications.