IBM InfoSphere Streams Version 4.1.1

Toolkit com.ibm.streams.db 2.0.0

SPL standard and specialized toolkits > com.ibm.streams.db 2.0.0

General Information

The Database Toolkit provides a set of operators that enable your streams processing applications to integrate with external data systems.

Streams processing applications process streams of data that flow from external sources and convert result streams to external formats that can be used by components that are not part of InfoSphere Streams. These applications can also merge data from external repositories with internal streams, which enriches their contents.

The IBM Streams Processing Language (SPL) standard toolkit includes source and sink operators, which provide generic adapters for files and network sockets. However, much of the world's data is stored in and made available by products with higher-level interfaces than files and sockets, such as database management systems (DBMS). You can use the Database Toolkit operators to access data from one of many supported external data systems.

For example, the Database Toolkit provides operators that write data to a partitioned DB2 database by using parallel write operations for each partition. Streams processing applications that process huge volumes of data can use these operators to provide improved performance when the application writes data to partitioned tables.

Although Database Toolkit operators can access data from external data systems, they do not define entities in those systems or otherwise manage the data or the system. External data systems are managed by tools and processes that are supplied by their vendors independently from the toolkit operators and the applications that use those operators. For example, the ODBCAppend operator inserts rows into a DBMS table, but the operator does not attempt to create the table. If the table does not exist, the ODBCAppend operator issues an error.

The Database Toolkit operators must be configured to connect to an external data system and to access specific data from that system. This configuration information is specified in an XML document that is called a connection specifications document. The connection specifications document is separate from the streams processing application for two main reasons:

  • This configuration information is often complex, detailed, and specific to a particular vendor or vendor product. By consolidating the configuration information, it is easier to maintain both the information itself and the applications that access it. The same configuration information is often shared by many operator declarations either in a single application or across multiple applications. Repeating the same information, in several operator declarations, increases the opportunity for errors and is difficult to maintain.
  • The people who understand how to configure the external data systems are often not the same people who develop the applications. Separating the configuration information from the application allows people in the two roles to work independently of each other with less need for coordination.
Developing and running applications that use the Database Toolkit
To create applications that use the Database toolkit, you must set the appropriate environment variables.
Operator control input port
Many operators in the Database Toolkit provide the capability to specify an optional input port.
Operator error output port
Many operators in the Database Toolkit provide the capability to specify an optional output port.
Connection specifications document for the Database Toolkit
A connection specifications document is an XML document that describes how operators in the Database Toolkit connect to and access specific external data services.
Database Toolkit operator runtime error conditions
When run time errors occur while you are using Database Toolkit operators, an error message is generated in the processing element log.
Version
2.0.0
Required Product Version
4.1.0.0

Namespaces

com.ibm.streams.db
Testing ODBC connections: You can use the odbchelper program in the Database Toolkit to help you find setup and configuration issues in ODBC connections.
Operators
com.ibm.streams.db.db2
Testing connections to DB2 databases: You can use the db2helper program in the Database Toolkit to test the connection to a DB2 database and determine the number of partitions in the database.
Operators
com.ibm.streams.db.netezza
Operators