IBM InfoSphere Streams Version 4.1.0

Lookup Manager application

The Lookup Manager application loads the external enrichment data in memory and controls the operation of the ITE applications.

Use the Lookup Manager if enrichment from external data sources is required. Data enrichment is the process of enhancing, refining or otherwise improving raw data. The Lookup Manager application loads and updates this data in memory, and distributes it across hosts. The ITE applications can query this data to enrich the processed records.

Enrichment data

Internally the enrichment data is stored in lookup repositories. The Lookup Manager inserts, updates and deletes entries in the lookup repositories on all configured hosts. It can manage multiple lookup data repositories, so called Segments, and each segment can contain multiple lookup repositories, also called Stores. Each store contains key-value pairs. The value is a set of attributes that will be returned as Streams attributes when a lookup on the corresponding key is performed. The data layout of the lookup segments and stores is defined in an XML file. The maximum memory size for each segment is also configured in the XML file. You need to customize this file and rebuild the Lookup Manager application to adapt it to your requirements.

Data sources

The source data for each segment can be retrieved from either CSV files or from external databases via SQL statements. A store can use arbitrary subsets of the columns present in the source data of the segment it is defined in. The key for store entries can be constructed from multiple columns. Defining two stores within one segment would allow you for example to create two lookup repositories from the same input data using different keys.

If CSV files are used as input, the files have to be copied to a certain configurable directory, where they can be picked up by the Lookup Manager application. The name of the input file is used by the Lookup Manager application to derive the name of the segment the source data is destined for.

For querying database sources as input, the Lookup Manager application uses the com.ibm.streams.db toolkit. Currently DB2® and Oracle are supported as databases. The SQL statement to retrieve data for a segment is configured in the database toolkit (via the connections.xml file) as usual. The Lookup Manager application uses the name of the access specification entry in the connections.xml file to derive the name of the segment the source data is destined for. When using a database as source, the Lookup Manager periodically checks the availability of the database. In case the database is not reachable or errors occur during loading the data, the problem is logged and the Lookup Manager application stops.

Command interface

Loading of enrichment data is done on demand. You initiate it manually be using the Lookup Manager’s command interface. This gives you the freedom of updating the lookup repositories whenever it is necessary. The process of updating the data can be easily automated by using external shell scripts. To send a command to the Lookup Manager application you have to prepare a command file and copy it to a certain, configurable directory. Than it will be picked up by the Lookup Manager application and the commands in the file will be processed. These commands are supported:

  • init

    The given Segment will be deleted and created. The stores for this segment will be created and filled with the data from the data source defined for the segment.

  • update

    Existing entries in the stores are updated with the entries retrieved from the data source. New entries will be inserted. All stores in the segment are affected.

  • delete

    Entries retrieved from the data source are deleted in all stores of the given segment. This operation is only supported for CSV files as data source.

Synchronization with ITE applications

A Lookup Manager application can serve multiple ITE applications. ITE applications are only reading data from the stores, all write operations are performed by the Lookup Manager application. To ensure a consistent state of the stores, the Lookup Manager stops the file processing of all ITE applications it serves during the update of the stores. When the ITE applications receive the request to stop processing, they continue to process all pending records until the current input files are drained and notify the Lookup Manager application that the processing has stopped. Only after all ITE applications have stopped processing, the Lookup Manager application starts to update the stores. This ensures that a certain input file in the ITE application is always processed using a consistent set of enrichment data, and that the update of the data is done on file boundaries.

To allow for this communication the same application control directory must be configured in the Lookup Manager application and the ITE applications, and this directory must be accessible from all hosts. In addition, the Lookup Manager application needs to know which ITE applications will participate in the solution. For that purpose a list of ITE applications need to be configured in the Lookup Manager customization file.

Related links:

  • Reference > Toolkits > Specialized toolkits > com.ibm.streams.teda 1.0.2 > Developing applications > Customizing applications > Customizing the Lookup Manager application
Component overview
This page gives a high level overview about the internal structure of the Lookup Manager application.