IBM Support

Using Cloudera JDBC drivers to add content to a collection

Product Documentation


Abstract

Watson Explorer Version 11.0.0.1, extends support to Cloudera Distribution of Apache Hadoop (CDH) Version 5.4.8.

Content

Through Cloudera JDBC drivers, data in Hive and Impala databases can be crawled. After crawling is finished, the documents are indexed.

The following implementations are supported through Cloudera JDBC drivers:

  • Apache Hive, through the Cloudera JDBC Driver for Apache Hive 2.5.15 or later. To use this driver to load files from a Hive database, you must install the driver on the Content Analytics crawler server. When you configure the JDBC crawler, select the Hive driver and the Hive database tables to be crawled.
  • Cloudera Impala, through the Cloudera JDBC Driver for Impala 2.5.28 or later. To use this driver to load files from an Impala database, you must install the driver on the Content Analytics crawler server. When you configure a JDCB crawler and select the Impala driver to use, the system shows a list of all database tables to be crawled, not just those supported for Impala. Take care to select the correct Impala database tables. If you select a database table for Hive, an exception is thrown.
    To configure Watson Explorer Content Analytics:
    1. Create a collection.
    2. Install the appropriate Hive or Impala JDBC driver on the Content Analytics crawler server.
    3. Create a JDBC database crawler. When you configure options for the crawler, be sure to select the correct driver.
    4. Specify information about the database that you want to crawl.
      Example fields for a Hive database:
      Database URL:
        jdbc:hive2://example.server.com:10000/default
      Driver Class Name:
        com.cloudera.hive.jdbc41.HS2Driver
      DriverPath:
        Specify the directory path that contains the HiveJDBC41.jar file and all other bundled JAR files. For example, if the modules are stored at C:\Software\jdbc\hive, specify C:\Software\jdbc\hive. Make sure that no unrelated files for the JDBC driver are stored in the directory.

      Example fields for an Impala database:
      Database URL:
        jdbc:impala://example.server.com:21050
      Driver Class Name:
        com.cloudera.impala.jdbc41.Driver
      DriverPath
        Specify the directory path that contains the ImpalaJDBC41.jar file and all other bundled JAR files. For example, if the modules are stored at C:\Software\jdbc\impala, specify C:\Software\jdbc\impala. Make sure that no unrelated files for the JDBC driver are stored in the directory.

[{"Product":{"code":"SS8NLW","label":"IBM Watson Explorer"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"11.0.0.1","Edition":"Advanced","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 June 2018

UID

swg27047250