IBM InfoSphere Streams Version 4.1.1

Developing and running applications that use the Database Toolkit

SPL standard and specialized toolkits > com.ibm.streams.db 2.0.0 > Developing and running applications that use the Database Toolkit

To create applications that use the Database toolkit, you must set the appropriate environment variables. You must also configure either Streams Studio or the SPL compiler to be aware of the location of the toolkit.

Before you begin

  1. Verify that you satisfy any additional RPM requirements for the toolkit. For more information, see the list of required RPMs for the SPL specialized toolkits in the product documentation.
  2. Install IBM InfoSphere Streams. Configure the product environment variables by entering the following command:
    source product-installation-root-directory/4.0.1.0/bin/streamsprofile.sh
  3. Install one of the following supported database products or drivers on the same machine as InfoSphere Streams:
    • Aster Database Version 6.0
      • Supported drivers: UnixODBC Version 2.3 or later, with an Aster driver
      • Note: Aster databases are not supported on IBM POWER systems. These database products do not provide client versions for IBM POWER systems.
      • When you use the ODBC operators provided by the Database Toolkit in your applications to connect to an Aster database, you must specify additional database configuration information.
    • HP Vertica Version 7.0.1
      • Supported drivers: UnixODBC Version 2.3 or later, with a Vertica driver
      • Note: HP Vertica databases are not supported on IBM POWER systems. These database products do not provide client versions for IBM POWER systems.
    • IBM InfoSphere BigInsights V3.0 and earlier (Big SQL 1.0)
      • Supported drivers: Big SQL ODBC
      • Note: IBM Big SQL 1.0 databases are not supported on IBM POWER systems. These database products do not provide client versions for IBM POWER systems.
      • IBM Big SQL 1.0 is a data warehouse system for Hadoop that you use to summarize, query, and analyze data. Big SQL provides access to data that is stored in IBM InfoSphere BigInsights. The following restrictions apply in this scenario:
        • Transactions are not supported.
        • Setting isolation levels is not supported.
        • Hive tables do not support INSERT, UPDATE and DELETE statements. However, you can use LOAD statements to insert data into a Hive table.
        • For more information, see the Big SQL reference in the InfoSphere BigInsights documentation.
    • IBM InfoSphere BigInsights V3.0 (Big SQL)
      • Supported drivers: DB2 ODBC
    • IBM DB2 Data Server Driver Package Version 9.7
      • Supported drivers: DB2 ODBC
    • IBM DB2 Data Server Driver Package Version 10.5
      • Supported drivers: DB2 ODBC
    • IBM Informix Dynamic Server Versions 11.50, 11.70, or 12.10
      • Supported drivers: Informix ODBC
    • IBM Netezza Versions 7.0
      • Supported drivers: UnixODBC Version 2.3.0, with a Netezza driver
    • IBM solidDB Version 7.0
      • Supported drivers: UnixODBC Version 2.3.0, with a solidDB client driver
      • Note: IBM solidDB databases are not supported on IBM POWER systems. These database products do not provide client versions for IBM POWER systems.
    • Microsoft SQL Server Versions 2008 or 2012
      • Supported drivers: UnixODBC Version 2.3 or later, with the Microsoft SQL Server ODBC driver 1.0 for Linux
    • MySQL Database Versions 5.6
      • Supported drivers: UnixODBC Version 2.3.1 or later, with a MySQL driver. If you are using the MySQL ODBC Connector driver, Version 5.3.2 or later is required.
      • Note: MySQL databases are not supported on IBM POWER systems. These database products do not provide client versions for IBM POWER systems.
    • Oracle Database 12c Release 1
      • Supported drivers: UnixODBC Version 2.3.1 or later, with an Oracle driver
    • Teradata Database Versions 14.0 or 14.10
      • Supported drivers: DataDirect ODBC driver for Teradata, which is packaged with the Teradata ODBC driver
      • Note: Teradata databases are not supported on IBM POWER systems. These database products do not provide client versions for IBM POWER systems.
      • When you use the ODBC operators provided by the Database Toolkit in your applications to connect to a Teradata database, you must specify additional database configuration information. For more information,

Procedure

  1. Verify that the appropriate environment variables are set for the operators that you use.
    • Note: You can use both DB2 and ODBC operators in your application, provided that the ODBC operators are configured to connect to the same DB2 data source. To do this, you must set both sets of _LIBPATH and _INCPATH environment variables. Otherwise, you cannot connect to two different types of databases in the same application. If you set environment variables for more than one type of database product, the application will not compile.
      • For DB2SplitDB and DB2PartitionedAppend operators:
        • The environment variables STREAMS_ADAPTERS_DB2_LIBPATH and STREAMS_ADAPTERS_DB2_INCPATH must be defined to be the path names of the directories where the external libraries and header files for the DB2 database are installed.
      • For NetezzaLoad, NetezzaPrepareLoad, ODBCAppend, ODBCEnrich, ODBCRun, and OBCSource operators:
        • These operators support access to databases that implement the ODBC specification. The following list contains the specific databases that are supported by these operators, and the environment variable that must be defined. Exactly one of the environment variables from the list must be defined, according to which database you are choosing to be targeted by the operator at compile time. The value that the environment variable is assigned is not important; it is the presence of the environment variable that is important.
          • Aster Database: STREAMS_ADAPTERS_ODBC_UNIX_OTHER
          • HP Vertica: STREAMS_ADAPTERS_ODBC_UNIX_OTHER
          • Big SQL 1.0: STREAMS_ADAPTERS_ODBC_BIGSQL
          • Big SQL in IBM InfoSphere BigInsights V3.0: STREAMS_ADAPTERS_ODBC_DB2
          • IBM Data Server Client and IBM Data Server Runtime Client: STREAMS_ADAPTERS_ODBC_DB2
          • IBM DB2 Runtime Client and DB2 Client: STREAMS_ADAPTERS_ODBC_DB2
          • IBM Informix Dynamic Server: STREAMS_ADAPTERS_ODBC_IDS
          • IBM Netezza: STREAMS_ADAPTERS_ODBC_NETEZZA
          • IBM solidDB: STREAMS_ADAPTERS_ODBC_SOLID
          • Microsoft SQL Server: STREAMS_ADAPTERS_ODBC_SQLSERVER
          • MySQL Database: STREAMS_ADAPTERS_ODBC_MYSQL
          • Oracle Database: STREAMS_ADAPTERS_ODBC_ORACLE
          • Teradata Database: STREAMS_ADAPTERS_ODBC_UNIX_OTHER
        • In addition to assigning some value to exactly one of the environment variables in the list, you must set the environment variables STREAMS_ADAPTERS_ODBC_INCPATH and STREAMS_ADAPTERS_ODBC_LIBPATH. Set the environment variables to the locations of the include and library directories for the ODBC driver that you are using.
        • The operators also allow for more databases that support ODBC by using the UnixODBC driver. To use this capability, define the environment variable STREAMS_ADAPTERS_ODBC_UNIX_OTHER.
  2. Develop your application. InfoSphere Streams Studio can help you create and debug SPL and SPL mixed-mode applications. To use the operators from the Database Toolkit, you must add the toolkit location. To avoid the need to fully qualify the operators, add a use directive in your application.
    • For example, you can add the following clause in your SPL source file:
       use com.ibm.streams.db::*;
    • You can also specify a use clause for individual operators by replacing the asterisk (*) with the operator name. For example:
       use com.ibm.streams.db::ODBCEnrich;
  3. Build your application. You can use the sc command or Streams Studio. To build the application in Streams Studio, you must add the toolkit location if you did not already do so. To build the application from the command line, you must configure the SPL compiler to find the root directory of the toolkit. Use one of the following methods:
    • Set the STREAMS_SPLPATH environment variable to the root directory of a toolkit or multiple toolkits (using a colon (:) as a separator). For example:
        export STREAMS_SPLPATH=$STREAMS_INSTALL/toolkits/com.ibm.streams.db
    • Specify the -t or --spl-path command parameter when you run the sc command. Note: These command parameters override the STREAMS_SPLPATH environment variable. For example:
        sc -t $STREAMS_INSTALL/toolkits/com.ibm.streams.db -M MyMain
  4. Start the InfoSphere Streams instance.
  5. Set the appropriate run time environment variables. You can use one of the following methods to ensure that the necessary environment variables are set when your application starts:
    • Set the environment variables in the .bashrc file for the instance owner. For example:
        export DB2INSTANCE=sdb2
        export ODBCINI=/mydir/odbc.ini
    • Set the environment variables by using the streamtool setproperty command. For example:
        streamtool setproperty -i myInstance --application-ev DB2INSTANCE=sdb2
  6. Run the application. You can submit the application as a job by using the streamtool submitjob command or by using Streams Studio.