IBM InfoSphere Streams Version 4.1.0

Ingesting data

When operating your Telecommunications Event Data Analytics applications your main focus is to read in and work on business data. The following shows you where to put your input data and how to control output data.

Before you begin

Your system has been prepared and your applications have been started and initialized as described in Starting the Lookup Manager and ITE applications.

For the following procedure we assume the following parameters and values have been specified:

  • global.applicationControlDirectory = <project-base>/control
  • lm.commandsDirectory = <project-base>/<lm-application-folder>/data/in/cmd
  • lm.file.directory = <project-base>/<lm-application-folder>/data
  • lm.statisticsDirectory = <project-base>/<lm-application-folder>/data/out/statistics
  • ite.ingest.directory.input = <project-base>/<ite-application-folder>/data/in
  • ite.ingest.directory.inputListFile =
  • ite.storage.directory.outputs = <project-base>/<ite-application-folder>/data/out
  • ite.storage.directory.statistics = <project-base>/<ite-application-folder>/data/out/statistics

If you use other values, adopt the procedure to your settings.

Procedure

  • Ingest business data

    The procedure describes the procedure only for one ITE application. If you have several ITE applications in your project you need to repeat these steps for these, too.

    1. Move your input data to the <project-base>/<ite-application-folder>/data/in directory. This step is an example only as the directory should be the landing zone for your input data and input files will be typically moved here automatically in a production system.
    2. The ITE job starts the file processing if it is in run state. You may check the state by examining the contents of the <project-base>/control/<ite-namespace> state file. It shall contain the run string.
    3. When finishing the processing of an input file the ITE application moves the file into the <project-base>/<ite-application-folder>/data/in/archive directory. Check the contents of this directory to see if the application processes the file.
    4. The ITE application writes output data for correct input data to the <project base>/<ite application-folder>/data/out/load directory. To see if your input data has been processed correctly check the contents of this directory. Typically, another application will consume the outputs.
    5. The ITE application writes output data for erroneous input data to the <project base>/<ite application-folder>/data/out/reject directory. To see which input data has been rejected by your application check the contents of the files in this directory. The files provide information to you why a record has been rejected. It contains an error code, for example, format error, duplicate, some detailed text, and the record number in the input file containing the original record.
    6. Finally, check the contents of the <project base>/<ite application-folder>/data/out/statistics directory. The <date>_<namespace>Statistics.txt file provides you with meta data for every processed file. This gives you a quick overview of all the data that has been processed so far.
  • Alter enrichment data

    From time to time it may become necessary to alter the enrichment data. You use the same procedure you followed when initializing the lookup repository, but your command file uses different commands. Please, remember to check the LookupMgrCustomizing.xml file to determine which commands are allowed for the repositories you want to modify.

    • Update enrichment data
      1. Put your input data files in <project-base>/<lm-application-folder>/data/in directory. The names of the input files have the <repository>.csv format.
      2. Create a command file called, for example, update_all.cmd and add the update; text to it.
      3. Move the command file into the <project-base>/<lm-application-folder>/data/in/cmd directory.
      4. The Lookup Manager application begins the update of the lookup repository or repositories.
      5. Upon finishing the initialization, the Lookup Manager application moves the command file to the <project-base>/<lm-application-folder>/data/in/cmd/archive directory. Check if the your command file appears in this directory to determine the end of the update process.
      6. Check the state of Lookup Manager application by examining the contents of the <project-base>/control/appl.ctl state file. It shall now contain the run string.
      7. Check the <date>_LookupManagerStatistics.txt file residing in the <project base>/<lm application folder>/data/out/statisitics directory. The first line stands for the internal initialization process. The second line contains the statistics for the processed command. Finally, the data provides a list of segments for verification
    • Delete enrichment data

      The process to delete data from the repository follows the same steps, but uses another file name, that is <repository>.del.csv, and the delete command string in the command file.

    You may combine update and delete commands in one command file and you may also restrict the commands to certain repositories.

Lookup Manager command file
The command file, which has the .cmd extension, has one or more command lines, each of which includes one command type. By default, it is stored in the cmd subdirectory in the input folder of the Lookup Manager job. The path to the input folder can be passed to the application when you submit the job by using the lm.commandsDirectory parameter. The Lookup Manager supports the following types of commands in the command file: init: Creates new shared memory segments.