Using Cloudera JDBC drivers to add content to a collection

Product Documentation

Abstract

Watson Explorer Version 11.0.0.1, extends support to Cloudera Distribution of Apache Hadoop (CDH) Version 5.4.8.

Content

Through Cloudera JDBC drivers, data in Hive and Impala databases can be crawled. After crawling is finished, the documents are indexed.

The following implementations are supported through Cloudera JDBC drivers:

Apache Hive, through the Cloudera JDBC Driver for Apache Hive 2.5.15 or later. To use this driver to load files from a Hive database, you must install the driver on the Content Analytics crawler server. When you configure the JDBC crawler, select the Hive driver and the Hive database tables to be crawled.

Cloudera Impala, through the Cloudera JDBC Driver for Impala 2.5.28 or later. To use this driver to load files from an Impala database, you must install the driver on the Content Analytics crawler server. When you configure a JDCB crawler and select the Impala driver to use, the system shows a list of all database tables to be crawled, not just those supported for Impala. Take care to select the correct Impala database tables. If you select a database table for Hive, an exception is thrown.

Create a collection.
Install the appropriate Hive or Impala JDBC driver on the Content Analytics crawler server.
Create a JDBC database crawler. When you configure options for the crawler, be sure to select the correct driver.
Specify information about the database that you want to crawl.

Example fields for a Hive database:

jdbc:hive2://example.server.com:10000/default

com.cloudera.hive.jdbc41.HS2Driver

Specify the directory path that contains the HiveJDBC41.jar file and all other bundled JAR files. For example, if the modules are stored at C:\Software\jdbc\hive, specify C:\Software\jdbc\hive. Make sure that no unrelated files for the JDBC driver are stored in the directory.

Example fields for an Impala database:

jdbc:impala://example.server.com:21050

com.cloudera.impala.jdbc41.Driver

Specify the directory path that contains the ImpalaJDBC41.jar file and all other bundled JAR files. For example, if the modules are stored at C:\Software\jdbc\impala, specify C:\Software\jdbc\impala. Make sure that no unrelated files for the JDBC driver are stored in the directory.

[{"Product":{"code":"SS8NLW","label":"IBM Watson Explorer"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"11.0.0.1","Edition":"Advanced","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Tips

Using Cloudera JDBC drivers to add content to a collection

Product Documentation

Abstract

Content

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?