IBM InfoSphere Streams Version 4.1.0

Parameter reference

SPL standard and specialized toolkits > com.ibm.streams.teda 1.0.2 > Parameter reference

The following Lookup Manager and ITE application parameters enable, disable, and configure application features to suit your needs.

These parameters have the following properties:

Type
Parameters can be integer, float, string, or enum type. The integer and float types are for numeric values.
Default

An optional parameter that can have a default value that is used if the parameter is omitted in the configuration file.
Cardinality

Specifies how many values are allowed for a parameter for compile time.

0..1 means that the parameter is optional and can take only one value.

0..n means that the parameter is optional and can take multiple comma-separated values.

1 means that the parameter is mandatory and can take only one value.

1..n means that the parameter is mandatory and can take multiple comma-separated values.
Application scope

Specifies which application (Lookup Manager, ITE application, or both) evaluates the parameter.
Provisioning time

Specifies whether the application evaluates the parameter during compile time, submission time, or both.

Normally, if the compile time parameter is not provided or if a default is overridden with an empty value, the submission-time parameter is mandatory. Otherwise, the compile-time parameter is used as a default for the submission-time parameter.
Valid values

For enumerations, the list of supported named values is provided. The named values are case-insensitive, which means that you can specify, for example, ite.embeddedSampleCode=off or ite.embeddedSampleCode=OFF.

For numeric values, you can provide a value that fits to the constraint. For example, a constraint might be >=1 (global.multihost.numberOfHosts). If you provide a value that is >= 1, your value is accepted. If you provide 0 or a negative value, you see an error message.

For string values, you can provide a value that matches the Perl regular expression. For example, the description of ite.cleanup.schedule.minute shows the (([1-5]?[0-9]-)?[1-5]?[0-9]) regular expression. You must provide a value that matches this regular expression, for example, 9-59 or 10.
Related parameters

Some parameters are related to others. For example, a parameter that can be switched on and off may have sub-parameters. If the parameter is switched off, sub-parameters are inactive. Or, if a parameter has a certain value, it may require that another parameter is either not present or also has a certain value.
Details

For some parameters, technical details are provided. For example, a parameter enables customized code that is stored in a certain composite operator, or administrative actions are required.

Note: In this topic, <namespace> is the namespace of the application. This namespace was specified when you create an application project with the wizard or the teda-create-project script.

global.applicationControlDirectory

Specifies the path of the directory that is used by the applications to store and exchange status information. The same path must be used for the Lookup Manager application and its controlled ITE applications.

If the applications are running on multiple hosts, the directory must be located in a shared file system.

A relative path is relative to the data directory.

Properties

Type: string

Cardinality: 1

Application scope: ITE, Lookup Manager

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

global.multiHost

Specifies whether the application bundle shall run on a single or multiple hosts. An application bundle can consist of a single ITE application or of a single Lookup Manager application with multiple ITE applications.

If you want to run the application bundle on multiple hosts, turn the parameter on. If you want to run the application bundle on a single host only, turn it off.

If the parameter is turned off, the child parameters are inactive.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE, Lookup Manager

Provisioning time: compile time

Valid values: off, on

Related parameters:

Children: global.multiHost.customHostTags, global.multiHost.numberOfHosts

Details

If the parameter is turned on, the application uses host tags to ensure, for example, that the enrichment data is updated on every host. The required host tags are stored in the hosttags.txt file, which is located in the application config directory. The host tags must be created and assigned to hosts using either the Streams Studio Streams Explorer or the streamtool command, for example, streamtool mktag or streamtool chhost.

global.multiHost.customHostTags

Specifies host tags that you want to use in your customized code to place operators on specific hosts.

The parameter is active only if the parent parameter is turned on.

Properties

Type: string

Default: empty list

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time

Valid values: a comma-separated list of values that match the \w+ regular expression

Related parameters:

Parent: global.multiHost
Other: global.multiHost.numberOfHosts

Details

The provided host tags are stored in the hosttags.txt file, which is located in the application config directory. The host tags must be created and assigned to hosts using either the Streams Studio Streams Explorer or the streamtool command, for example, streamtool mktag or streamtool chhost.

global.multiHost.numberOfHosts

Specifies the number of hosts that will hold enrichment data. This number must be identical to the number of hosts that are assigned the <namespace>_lookup_host_writer host tag.

The parameter is active only if the parent parameter is turned on.

Properties

Type: integer

Default: 1

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time, submission time

Valid values: any integer value that is equal to or greater than 1

Related parameters:

Parent: global.multiHost
Other: global.multiHost.customHostTags

Details

The application uses the UDP feature to create as many operator instances as needed, initializing and updating the enrichment data on every host. Each host has its own operator instance. In other words, a host exlocation is used.

If the number of hosts that have the <namespace>_lookup_host_writer host tag assigned is less than this parameter value, the job submission fails. If it is greater, it is not predictable which hosts hold the enrichment data.

ite.archive.inputFilesIntoDateDirectory

Specifies whether the ITE application archives processed input files in a per-day directory or in a directory that receives all files.

If the parameter is off, the archive directory receives all files. The archive directory is relative to the data directory.

If the parameter is on, the ITE application creates a directory for every day that receives the processed input files for that day. The directory path is archive/YYYYMMDD with YYYY as year, MM as the month and DD as the day. The archive/YYYYMMDD directory is relative to the data directory.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

ite.businessLogic.group

Specifies whether tuples are grouped.

If the parameter is off, the ITE application does not group tuples.

If the parameter is on, the ITE application groups tuples, and at least one of the built-in correlations must be enabled. This means that either the tuple deduplication, the custom correlation, or both must be enabled.

CAUTION: If the checkpointing for the group logic is enabled, the ITE applications will regularly run internal maintenance tasks that pause the file processing for few seconds till several minutes.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Children: ite.businessLogic.group.custom, ite.businessLogic.group.debug, ite.businessLogic.group.deduplication, ite.businessLogic.group.startupControlFile, ite.businessLogic.group.tap, ite.fuse.group.operators, ite.fuse.groupWithChain.operators

ite.businessLogic.group.custom

Specifies whether the ITE application groups tuples by using the custom correlation logic.

If you want to group tuples using your correlation logic, set this and the parent parameter to on and implement your correlation logic in the <namespace>.context.custom::ContextDataProcessor composite operator. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.context::SampleContextDataProcessor composite operator.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group
Children: ite.businessLogic.group.custom.checkpointing
Other: ite.businessLogic.group.debug, ite.businessLogic.group.deduplication, ite.businessLogic.group.startupControlFile, ite.businessLogic.group.tap, ite.embeddedSampleCode, ite.fuse.group.operators, ite.fuse.groupWithChain.operators

ite.businessLogic.group.custom.checkpointing

Specifies whether checkpoint files for the custom logic of group processing are stored. If this parameter is off, the state of the custom logic cannot be recovered if the application is restarted. For example, if your custom logic aggregates data across file boundaries, data that has been collected is lost.

Committed checkpoint files are named custom/<groupId>/committed/<input-filename>.bin and are located in the output directory that is specified in the ite.checkpointing.directory parameter.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group.custom
Children: ite.businessLogic.group.custom.daysToKeep, ite.businessLogic.group.custom.timeToKeep

Details

The ite.businessLogic.group.custom.daysToKeep parameter is active only if the parent and this parameters are set to on.

ite.businessLogic.group.custom.daysToKeep

Deprecated. Specifies the number of days after which tuples are removed from the stateful custom group.

This parameter is active only if the parent and the ite.businessLogic.group.custom.checkpointing parameters are set to on.

The parameters ite.businessLogic.group.custom.daysToKeep and ite.businessLogic.group.custom.timeToKeep are mutually exclusive.

Properties

Type: integer

Default: 1

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any integer value from 1 to 365, inclusive

Related parameters:

Parent: ite.businessLogic.group.custom.checkpointing
Other: ite.businessLogic.group.custom.timeToKeep

Details

If the ite.businessLogic.group.custom.checkpointing parameter is on, the ITE application automatically saves all tuples that are received by the custom correlation logic to the hard disk. If the application restarts, for example because of maintenance or an automatic data refreshment and eviction cycle, the ITE application removes old tuples from the saved tuple set and processes only the valid tuples to rebuild an updated state of the custom correlation logic.

ite.businessLogic.group.custom.timeToKeep

Specifies the time after which tuples are removed from the stateful custom group.

This parameter is active only if the parent and the ite.businessLogic.group.custom.checkpointing parameters are set to on.

The parameters ite.businessLogic.group.custom.timeToKeep and ite.businessLogic.group.custom.daysToKeep are mutually exclusive.

Properties

Type: string

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any value that matches the (\d+d)?\s*(\d+h)?\s*(\d+m)? regular expression

Related parameters:

Parent: ite.businessLogic.group.custom.checkpointing
Other: ite.businessLogic.group.custom.daysToKeep

Details

ite.businessLogic.group.debug

Enables additional file outputs that troubleshoot your ITE application. The files are located in the debug directory, which is a subdirectory of the configured data directory.

When this parameter is on, you receive information about the commands and data that are processed in the group logic.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group
Other: ite.businessLogic.group.custom, ite.businessLogic.group.deduplication, ite.businessLogic.group.startupControlFile, ite.businessLogic.group.tap, ite.fuse.group.operators, ite.fuse.groupWithChain.operators

Details

The following files are created only if the ite.businessLogic.group and ite.businessLogic.group.debug parameters are turned on:

CONTEXT_CMD_<GROUP_ID>.txt: Receives log entries for internal checkpoint commands (clear, read, write) that are received by the group logic.
CONTEXT_CMD_RESP_<GROUP_ID>.txt: Receives log entries for start and stop responses that leave the group logic.
CONTEXT_DATA_IN_<GROUP_ID>.txt: Receives log entries for data tuples that are received by the group logic.
CONTEXT_DATA_OUT_<GROUP_ID>.txt: Receives log entries for valid data tuples that leave the group logic.
DEDUP_CMD_<GROUP_ID>.txt: Receives log entries for refresh and shutdown signals that are received by the deduplication.
DEDUP_CMD_RESP_<GROUP_ID>.txt: Receives log entries for refresh and shutdown responses that leave the deduplication.
DEDUP_IN_<GROUP_ID>.txt: Receives log entries for data tuples that are received by the deduplication.
DEDUP_OUT_<GROUP_ID>.txt: Receives log entries for data tuples that leave the deduplication and sets whether the tuple is unique or a duplicate.
BLOOM_OUT_<GROUP_ID>.txt: Receives log entries for data tuples that leave the deduplication during the training phase that starts during the initialization phase or after receiving a refresh signal.

ite.businessLogic.group.deduplication

Specifies whether the ITE application groups tuples according to the built-in deduplication logic.

To enable the tuple deduplication, set this and the parent parameter to on.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group
Children: ite.businessLogic.group.deduplication.checkpointing, ite.businessLogic.group.deduplication.probability
Other: ite.businessLogic.group.custom, ite.businessLogic.group.debug, ite.businessLogic.group.startupControlFile, ite.businessLogic.group.tap, ite.fuse.group.operators, ite.fuse.groupWithChain.operators

Details

The deduplication uses a memory-efficient algorithm that can lead to false positives, which means that unique tuples are marked as duplicates. For more information, see the child parameters or the BloomFilter operator.

ite.businessLogic.group.deduplication.checkpointing

Specifies whether to store checkpoint files for the deduplication of the group processing. If this parameter is off, the state of the deduplication cannot be recovered if the application is restarted. For example, unique tuples are not restored in the deduplication logic anymore, so duplicate tuples would be detected as unique tuples.

The committed checkpoint files are named <groupId>/committed/<input-filename>.chk and are located in the output directory that is specified in the ite.checkpointing.directory parameter.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group.deduplication
Children: ite.businessLogic.group.deduplication.daysToKeep, ite.businessLogic.group.deduplication.timeToKeep
Other: ite.businessLogic.group.deduplication.probability

Details

The ite.businessLogic.group.deduplication.daysToKeep parameter is active only if the parent and this parameter is set to on.

ite.businessLogic.group.deduplication.daysToKeep

Deprecated. Specifies the number of days after which tuples are removed from the stateful deduplication.

The parameter is active only if the parent and the ite.businessLogic.group.deduplication.checkpointing parameters are set to on.

The parameters ite.businessLogic.group.deduplication.daysToKeep and ite.businessLogic.group.deduplication.timeToKeep are mutually exclusive.

Properties

Type: integer

Default: 1

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any integer value from 1 to 365, inclusive

Related parameters:

Parent: ite.businessLogic.group.deduplication.checkpointing
Other: ite.businessLogic.group.deduplication.timeToKeep

Details

If the ite.businessLogic.group.deduplication.checkpointing parameter is on, the ITE application automatically saves all tuples that are received by the deduplication logic to the hard disk. If the application restarts, for example because of maintenance or an automatic data refreshment and eviction cycle, the ITE application removes old tuples from the saved tuple set and processes valid tuples to rebuild an updated state of the deduplication logic.

ite.businessLogic.group.deduplication.probability

Specifies the probability of false positives that are allowed for duplicate detection.

A false positive occurs when a tuple is marked as a duplicate even though it is unique.

The expected number of unique tuples, for which this probability is ensured, is specified in the file that is specified in the ite.ingest.loadDistribution.groupConfigFile parameter.

Properties

Type: float

Default: 0.001

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any float value from 0 to 0.1, inclusive

Related parameters:

Parent: ite.businessLogic.group.deduplication
Other: ite.businessLogic.group.deduplication.checkpointing, ite.ingest.loadDistribution.groupConfigFile

Details

For more details about the probability and the number of expected unique tuples, see the BloomFilter operator.

ite.businessLogic.group.deduplication.timeToKeep

Specifies the time after which tuples are removed from the stateful deduplication.

The parameter is active only if the parent and the ite.businessLogic.group.deduplication.checkpointing parameters are set to on.

The parameters ite.businessLogic.group.deduplication.timeToKeep and ite.businessLogic.group.deduplication.daysToKeep are mutually exclusive.

Properties

Type: string

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any value that matches the (\d+d)?\s*(\d+h)?\s*(\d+m)? regular expression

Related parameters:

Parent: ite.businessLogic.group.deduplication.checkpointing
Other: ite.businessLogic.group.deduplication.daysToKeep

Details

ite.businessLogic.group.startupControlFile

Specifies the name of the text file that delays the initialization of the ITE application. As soon as the file exists and contains the done value in the first row, the initialization begins.

You use this file to indicate completed external activities that are required before the ITE application starts its initialization, for example, creating files that are needed for the custom or deduplication initialization from a database.

The specified file is expected in the control directory that is identified by the global.applicationControlDirectory parameter.

Properties

Type: string

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any value that matches the [^\/]+ regular expression

Related parameters:

Parent: ite.businessLogic.group
Other: global.applicationControlDirectory, ite.businessLogic.group.custom, ite.businessLogic.group.debug, ite.businessLogic.group.deduplication, ite.businessLogic.group.tap, ite.fuse.group.operators, ite.fuse.groupWithChain.operators

ite.businessLogic.group.tap

Turns the post-group data processor tap on or off.

If this tap is turned on, another stream that contains the tuples that passed the business logic, including the group logic (for example, deduplication), is activated. You may use these tuples to implement features that do not alter the data stored in the files by the main business logic. For example, the tap logic filters for tuples and sends an event to another application or another system if the filter condition is met. The spl.adapter::Export operator or any sink operator like the spl.adapter::TCPSink operator may be used with the tap data tuples.

Implement your tap logic in the <namespace>.tap.custom::PostContextDataProcessorTap composite operator. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.tap::SamplePostContextDataProcessorTap composite operator.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group
Other: ite.businessLogic.group.custom, ite.businessLogic.group.debug, ite.businessLogic.group.deduplication, ite.businessLogic.group.startupControlFile, ite.businessLogic.transformation.tap, ite.embeddedSampleCode, ite.fuse.group.operators, ite.fuse.groupWithChain.operators

Details

The ITE application supports two taps.

The first tap is turned on with the ite.businessLogic.transformation.tap parameter and normally used only if the ite.businessLogic.group parameter is turned off.
The second tap is turned on with the ite.businessLogic.group.tap parameter and normally used only if the ite.businessLogic.group parameter is turned on.

ite.businessLogic.sink.debug

Specifies whether to enable additional file outputs that are used to troubleshoot your ITE application. The files are located in the debug directory, which is a subdirectory of the configured data directory.

When this parameter is set to on, you receive information about the storage stage.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Details

The following files are created:

CHAIN_TRANSFORMER_IN_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for data tuples that are received by the <namespace>.chainprocessor.transformer::ChainprocessorTransformerCore composite operator.
CHAIN_TRANSFORMER_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for data tuples that are sent by the <namespace>.chainprocessor.transformer::ChainprocessorTransformerCore composite operator.
CHAIN_TRANSFORMER_STAT_IN_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for statistics tuples that are received by the <namespace>.chainprocessor.transformer::ChainprocessorTransformerCore composite operator at the end of each file.
CHAIN_TRANSFORMER_STAT_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for enriched statistics tuples that are sent by the <namespace>.chainprocessor.transformer::ChainprocessorTransformerCore composite operator at the end of each file.

ite.businessLogic.transformation.debug

When this parameter is set to on, you receive information about the transformation stage.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Details

Following files are created:

SINK_FILE_WRITER_STAT_IN_<GROUP_ID>_<CHAIN_ID>.txt: statistic tuple sent by FileReader at end of file
SINK_FILE_WRITER_IN_<GROUP_ID>_<CHAIN_ID>.txt: data tuples to write to file at RecordFileWriter or TableFileWriter
CHAIN_POSTCONTEXT_IN_<GROUP_ID>_<CHAIN_ID>.txt: data tuples received from context
CHAIN_POSTCONTEXT_OUT_<GROUP_ID>_<CHAIN_ID>.txt: data tuples sent to FileWriter Sink
CHAIN_POSTCONTEXT_STAT_IN_<GROUP_ID>_<CHAIN_ID>.txt: statistic tuple sent by FileReader at end of file
CHAIN_POSTCONTEXT_STAT_OUT_<GROUP_ID>_<CHAIN_ID>.txt: statistic tuple sent by FileReader at end of file

ite.businessLogic.transformation.lookup

Specifies whether the ITE application performs data enrichment using the lookup functionality.

If you want to use the lookup functionality, set the parameter to on. If not, set the parameter to off. In this case, the ITE application runs independently of the Lookup Manager application.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: off, on

Related parameters:

Other: ite.ingest.reader.schemaExtensionForLookup

Details

The ite.ingest.reader.schemaExtensionForLookup parameter is related because all attributes that are introduced in lookup are added already to the stream definition for the parser output when ite.ingest.reader.schemaExtensionForLookup is switched on. This setting creates a streams schema that is used throughout the application (beginning to end).

The Lookup Manager application controls the initialization and updates of the enrichment data. During the initialization and the updates, the ITE application is paused.

ite.businessLogic.transformation.outputType

Specifies the output schema of the <namespace>.chainprocessor.transformer::ChainprocessorTransformerCore composite that is handled by the <namespace>.streams::TypesCommon.TransformerOutType while considering the value of the ite.storage.type parameter. The streams are defined in "TypesCommon" and "TypesCustom" and used in the "DataProcessor" composites.

If tuple deduplication is enabled, the hash code must be part of the defined tuple.

Valid values of this parameter are:

tableStream: This output stream becomes the input of the TableRowGenerator. One tuple contains a single table row and one hash code for deduplication. If an input record results in multiple table rows or input to different tables, several tuples must be sent by the Transformer.
extendedTableStream: Extends the table schema, for example, if lookup data is evaluated in custom PostDedupProcessor or in CustomContext. This is all that the 'tableStream' selection is extended with the <namespace>.streams::TypesCustom.ExtendedTableStream or <namespace>.streams.custom::TypesCustom.ExtendedTableStream streams.
recordStream: Enables the RecordStreamType that contains the TransformedRecord tuple. It is used when ite.storage.type is set to 'recordFile' or 'custom'. The PostContextDataProcessor composte creates the row tuples.

Properties

Type: enum

Default: recordStream

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: extendedTableStream, recordStream, tableStream

Related parameters:

Other: ite.storage.type

ite.businessLogic.transformation.postprocessing.custom

Enables the custom logic that runs after the group processing but before the storage stage.

If you want to implement this custom logic, set this parameter to on and adapt the <namespace>.chainsink.custom::PostContextDataProcessor composite. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.chainsink::SamplePostContextDataProcessor composite operator.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.embeddedSampleCode

ite.businessLogic.transformation.tap

Turns the post-transformation data processor tap on or off.

If this tap is turned on, another stream that contains the tuples that passed the business logic, excluding the group logic (for example, deduplication), is activated. You may use these tuples to implement features that do not alter the data stored in the files by the main business logic. For example, the tap logic filters for tuples and sends an event to another application or another system if the filter condition is met. The spl.adapter::Export operator or any sink operator like the spl.adapter::TCPSink operator may be used with the tap data tuples.

Implement your tap logic in the <namespace>.tap.custom::TransformerTap composite operator. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.tap::SampleTransformerTap composite operator.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.businessLogic.group.tap, ite.embeddedSampleCode

Details

The ITE application supports two taps.

The first tap is turned on with the ite.businessLogic.transformation.tap parameter and normally used only if the ite.businessLogic.group parameter is turned off.
The second tap is turned on with the ite.businessLogic.group.tap parameter and normally used only if the ite.businessLogic.group parameter is turned on.

ite.businessLogic.transformation.tupleGroupSplit

Enables tuple grouping based on tuple attributes to increase parallelization, improve throughput, or overcome memory limitations.

For example, you want to run deduplicatation on several billion unique records. Even with memory-efficient deduplication, you exceed the available memory. Tuple grouping allows you to build smaller record subsets that are distributed to different instances of the deduplication logic on different hosts. The tuple grouping also ensures that tuples with the same identification, also called group ID, are routed to the same instance. The memory requirement for deduplication that runs with a subset of records is less than the memory requirement for deduplication that runs with the complete record set.

If this parameter is set to on, tuple grouping based on tuple attributes is enabled.

As a developer, you implement your custom business logic in the <namespace>.chainprocessor.transfomer.custom::DataProcessor composite. As part of this implementation, you provide the destination group ID in the groupID SPL output attribute. The groupID is a 2-digit rstring attribute that supports a range from 00 to 99. The default groupId value is 00. Tuples that have the same identification must result in the same groupID value. For example, a key attribute of the tuple has a range from 0 to 255. You want to divide this range into two subranges, 0 to 127 and 128 to 255. If the key attribute is in the first range, you provide the 00 groupID. If it is in the second range, you provide the 01 groupID.

If this parameter is set to on, the ite.businessLogic.group parameter must be set to on, and the ite.ingest.fileGroupSplit parameter must be set to off. In other words, this parameter can only be set to on for an ITE application that uses variant B. For ITE applications that use variant A or C, this parameter must be set to off.

Properties

Type: enum

Cardinality: 1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.businessLogic.group, ite.ingest.fileGroupSplit

Details

When you create an SPL project using the IBM® InfoSphere® Streams Studio wizard or the teda-create-project command line tool, you selected a variant for your ITE application. The wizard or command line tool set this parameter to the value that is appropriate for your selected variant. Typically, you do not change this value.

ite.checkpointing.directory

Specifies the directory that receives checkpoint files.

A relative path is relative to the data directory.

For more information about the checkpoint files, see the related parameters.

Properties

Type: string

Default: "./checkpoint"

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Related parameters:

Other: ite.businessLogic.group.custom.checkpointing, ite.businessLogic.group.deduplication.checkpointing

ite.cleanup.schedule.dayOfMonth

Specifies the day or days of the month on which automated cleanup operations run. To enable automated cleanup operations, the other schedule parameters must also be specified.

Automated cleanup operations are required, for example, to remove old information from the file or tuple deduplication.

See the ScheduledBeacon operator for more information about the schedule.

Properties

Type: string

Default: empty list

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: a comma-separated list of values that match the (([1-2]?[0-9]|3[01])-)?([1-2]?[0-9]|3[01]) regular expression

Details

Before the automated cleanup runs, the ITE application suspends file processing. The cleanup operations can run from a few seconds to several hours, depending on your configuration and, for example, the amount of active records in your deduplication logic.

ite.cleanup.schedule.dayOfWeek

Specifies the day or days of the week on which automated cleanup operations run. To enable automated cleanup operations, the other schedule parameters must also be specified.

Automated cleanup operations are required, for example, to remove old information from the file or tuple deduplication.

See the ScheduledBeacon operator for more information about the schedule.

Properties

Type: enum

Default: *

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: a comma-separated list of the following values: *, 0, 1, 2, 3, 4, 5, 6, Fri, Friday, Mon, Monday, Sat, Saturday, Sun, Sunday, Thu, Thursday, Tue, Tuesday, Wed, Wednesday

Details

ite.cleanup.schedule.hour

Specifies the hour or hours of the day during which automated cleanup operations run. To enable automated cleanup operations, the other schedule parameters must also be specified.

Automated cleanup operations are required, for example, to remove old information from the file or tuple deduplication.

See the ScheduledBeacon operator for more information about the schedule.

Properties

Type: string

Default: "0"

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: a comma-separated list of values that match the (([0-9]|1[0-9]|2[0-3])-)?([0-9]|1[0-9]|2[0-3]) regular expression

Details

ite.cleanup.schedule.minute

Specifies the minute or minutes of the hour at which automated cleanup operations run. To enable automated cleanup operations, the other schedule parameters must also be specified.

Automated cleanup operations are required, for example, to remove old information from the file or tuple deduplication.

See the ScheduledBeacon operator for more information about the schedule.

Properties

Type: string

Default: "0"

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: a comma-separated list of values that match the ([1-5]?[0-9]-)?[1-5]?[0-9] regular expression

Details

ite.control.debug

Enables additional file outputs that are used to troubleshoot your ITE application. The files are located in the debug directory, which is a subdirectory of the configured data directory.

If this parameter is set to on, you get information about the status and status changes of the ITE application.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.businessLogic.group, ite.cleanup.schedule.dayOfMonth, ite.cleanup.schedule.dayOfWeek, ite.cleanup.schedule.hour, ite.cleanup.schedule.minute

Details

If this parameter is enabled, the following files are created:

CONTROLLER_APPL_CTRL_OUT.txt: Receives log entries for each start or stop command that is sent to the chains.
CONTROLLER_APPL_CTRL_RESP_IN.txt: Receives log entries for each start or stop response.
CONTROLLER_CONTEXT_CTRL_OUT.txt: Receives log entries for each shutdown or refresh signal that is sent to the group logic. This file is created only if the ite.businessLogic.group parameter is enabled.
CONTROLLER_CONTEXT_READY_IN.txt: Receives log entries for each shutdown or refresh response. This file is created only if the ite.businessLogic.group parameter is enabled.
CONTROLLER_FILE_INGEST_CLEANUP_OUT.txt: Receives log entries for the initialization phase and for the automated cleanup operations that are scheduled with, for example, the ite.cleanup.schedule.dayOfMonth parameter.
CONTROLLER_FILE_INGEST_CTRL_OUT.txt: Receives log entries for the start of the file ingestion.

ite.embeddedSampleCode

Activates sample code in created ITE projects. By default, this parameter is enabled (on), creating projects with a ready-to-run implementation. When coding custom code starts for the custom namespace composites, this parameter must be disabled. If you disable the parameter, you must also assign your parsers to ite.ingest.reader.parserList.

If this parameter is set to on, all customized code is disabled.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.businessLogic.group.custom, ite.businessLogic.group.tap, ite.businessLogic.transformation.postprocessing.custom, ite.ingest.customFileTypeValidator, ite.ingest.reader.preprocessing, ite.ingest.reader.schemaExtensionForLookup, ite.storage.auditOutputs, ite.storage.rejectWriter.custom

ite.fuse.chain.operators

This parameter describes the operator fusing of all operators from the following namespaces:

<namespace>.chainprocessor.reader
<namespace>.chainprocessor.reader.custom
<namespace>.chainprocessor.transformer
<namespace>.chainprocessor.transformer.custom
<namespace>.chainsink
<namespace>.chainsink.custom

Set the parameter on to fuse all operators into a single Processing Element to achieve better performance. You can better analyze the congestion factor or problems in an operator if you set this parameter to off.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.fuse.group.operators, ite.fuse.groupWithChain.operators

ite.fuse.group.operators

This parameter describes the operator fusing of all operators from the following namespaces:

<namespace>.context
<namespace>.context.custom
<namespace>.housekeeping.context.custom

Set the parameter on to fuse all operators of one group into a single Processing Element to achieve better performance. You can better analyze the congestion factor or problems in an operator if you set this parameter to off. Each group is running in an own Processing Element if this parameter is on.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group
Other: ite.businessLogic.group.custom, ite.businessLogic.group.debug, ite.businessLogic.group.deduplication, ite.businessLogic.group.startupControlFile, ite.businessLogic.group.tap, ite.fuse.chain.operators, ite.fuse.groupWithChain.operators, ite.fuse.groupWithChain.operators

ite.fuse.groupWithChain.operators

This parameter describes the operator fusing of all operators from the following namespaces:

<namespace>.chainprocessor.reader
<namespace>.chainprocessor.reader.custom
<namespace>.chainprocessor.transformer
<namespace>.chainprocessor.transformer.custom
<namespace>.chainsink
<namespace>.chainsink.custom
<namespace>.context
<namespace>.context.custom
<namespace>.housekeeping.context.custom

If this parameter is turned on, then the operators are fused and the tuples are not sent across Processing Elements. In variant B, all chains and all group operators are in a single Processing Element. As a consequence it not possible to scale across hosts with Variant B if parameter is turned on. In variant C, all chains of one group are fused to the same Processing Element.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.businessLogic.group
Other: ite.businessLogic.group.custom, ite.businessLogic.group.debug, ite.businessLogic.group.deduplication, ite.businessLogic.group.startupControlFile, ite.businessLogic.group.tap, ite.fuse.chain.operators, ite.fuse.group.operators, ite.fuse.group.operators

ite.ingest.archiveMode

Specifies the base directory that is used for the following subdirectories:

archive: Receives successfully processed input files.
duplicate: Receives duplicate input files (files that are already processed).
invalid: Receives files that do not match the allowed file types and formats.
failed: Receives files with which unexpected problems occurred and that are not automatically resolved.

If you set this parameter to single, then the ite.ingest.directory.input parameter is used as base directory.

In case ite.ingest.directory.inputListFile contains multiple directories and ite.ingest.archiveMode is set to multiple the subdirectories are created to the corresponding input directory.

Properties

Type: enum

Default: single

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: multiple, single

Related parameters:

Other: ite.ingest.directory.inputListFile

ite.ingest.customFileTypeValidator

Enables file-type validation. File-type validation distinguishes between different file types and data formats, for example CSV or ASN.1. Depending on the determined file type, the ITE application sends the file name to the appropriate parse logic.

If file-type validation is turned off, every file is processed. Only one parse logic exists that processes all files.

If the file-type validation is turned on, file names are determined to be valid or invalid. If a file is invalid, it is not processed but logged as invalid and moved to the invalid directory, which is a subdirectory of the input directory that is specified with the ite.ingest.directory.input parameter.

If the filename is valid, a unique file type ID is stored in the fileType SPL output attribute of the <namespace>.fileingestion.custom::FileTypeValidator composite operator. As a developer, you want to implement an algorithm that validates the file name and determines the file type in the <namespace>.fileingestion.custom::FileTypeValidator composite operator. To activate your algorithm, set this parameter to on. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.fileingestion::SampleFileTypeValidator composite operator.

The unique file type IDs that can occur as a result of your algorithm must be consistent with the types that are specified with the ite.ingest.reader.parserList parameter. Any inconsistency is reported as soon as it occurs, either leading to an unhealthy processing element or a log message for this file, depending on the ite.resilienceOptimization parameter.

The easiest algorithm checks for a file name pattern. A more complicated algorithm could read and analyze the file contents.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.embeddedSampleCode, ite.ingest.reader.parserList, ite.resilienceOptimization

ite.ingest.debug

Enables additional file outputs that are used to troubleshoot your ITE application. The files are located in the debug directory, which is a subdirectory of the configured data directory.

When this parameter is on, you get information about file detection.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Details

If this parameter is enabled, the following files are created:

FILEINGESTION_DROPPED_FILES.txt: Receives log entries for files that are dropped because their names are either invalid or duplicates.
FILEINGESTION_FILES.txt: Receives log entries for files that have valid and unique filenames.
FILEINGESTION_IN_ACK_FILES.txt: Receives log entries for files that are processed and commit themselves to the file name deduplication logic.
FILEINGESTION_IN_CTRL.txt: Receives log entries for start and stop commands that enable or disable the directory scan.
FILEINGESTION_OUT_FILES.txt: Receives log entries for files that must be processed. For an ITE application in variant C, the groupID SPL attribute is set. This attribute distributes this tuple to the correct group logic instance.
RawFiles_<sequence>.txt: Receives log entries for each detected input file. After 100,000 entries, a new log file is created with an incremented sequence number.

ite.ingest.deduplication

Enables file name deduplication.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Children: ite.ingest.deduplication.daysToKeep, ite.ingest.deduplication.reprocessFilePattern, ite.ingest.deduplication.timeToKeep
Other: ite.ingest.directoryScan.processFilePattern

ite.ingest.deduplication.daysToKeep

Deprecated. Specifies the number of days after which a file name is removed from the set of unique file names in the file name deduplication logic. This parameter can be applied at submission-time only, if the parameter ite.ingest.deduplication.timeToKeep is not set. The parameters ite.ingest.deduplication.daysToKeep and ite.ingest.deduplication.timeToKeep are mutually exclusive.

Properties

Type: integer

Default: 1

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: any integer value from 1 to 365, inclusive

Related parameters:

Parent: ite.ingest.deduplication
Other: ite.ingest.deduplication.reprocessFilePattern, ite.ingest.deduplication.timeToKeep

Details

This parameter is active only if the parent parameter, ite.ingest.deduplication, is set to on.

ite.ingest.deduplication.reprocessFilePattern

Defines the file name pattern for files to reprocess. Matching file names bypass the duplicate check of the file ingestion logic, and the files are processed again. The pattern should not match the same set of files as the pattern configured for parameter ite.ingest.directoryScan.processFilePattern, because this would allow all processed files to bypass the duplicate check.

Properties

Type: string

Default: ""

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any string

Related parameters:

Parent: ite.ingest.deduplication
Other: ite.ingest.deduplication.daysToKeep, ite.ingest.deduplication.timeToKeep

Details

This parameter is active only if the parent parameter, ite.ingest.deduplication, is set to on.

ite.ingest.deduplication.timeToKeep

Specifies the time after which a file name is removed from the set of unique file names in the file name deduplication logic. The parameters ite.ingest.deduplication.timeToKeep and ite.ingest.deduplication.daysToKeep are mutually exclusive.

Properties

Type: string

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any value that matches the (\d+d)?\s*(\d+h)?\s*(\d+m)? regular expression

Related parameters:

Parent: ite.ingest.deduplication
Other: ite.ingest.deduplication.daysToKeep, ite.ingest.deduplication.reprocessFilePattern

Details

This parameter is active only if the parent parameter, ite.ingest.deduplication, is set to on.

ite.ingest.directory.input

Specifies the path of the directory that receives the input files. A relative path is relative to the data directory.

The input files must occur in this directory as a result of an atomic action. In other words, it is recommended that you move input files into this directory instead of copying or creating them. Over time, copying or creating input files might result in incompletely processed or failed files.

Properties

Type: string

Default: "./in"

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Details

The ITE application creates following subdirectories during the startup phase:

archive: Receives successfully processed input files.
duplicate: Receives duplicate input files (files that are already processed).
invalid: Receives files that do not match the allowed file types and formats.
failed: Receives files with which unexpected problems occurred and that are not automatically resolved.
reprocess: Contains files that will be reprocessed, for example after a correction. Move the necessary files into this directory.

ite.ingest.directory.inputListFile

Configures the path to the file that contains a list of several input directories. This file is a text file that contains one absolute or relative directory path per line. Comment lines start with a pound symbol ('#') in column 1. The list must not contain duplicates. This parameter is optional.

If this parameter is used, all files from the first directory in the list are considered urgent files. Urgent files are queued in a separate file queue, which has precedence over the normal file queue.

Properties

Type: string

Default: ""

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: any string

ite.ingest.directoryScan.nanoSecondsPrecision

Enables scanning of files with nanosecond precision. When this parameter is turned off, all nanoseconds fields are set to zero in the directory scanner. If your file system does not support nanosecond precision, this parameter can be turned off.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

ite.ingest.directoryScan.processFilePattern

Defines a file name pattern. The directory scanner reports matching file names to the following ingestion logic. If file name deduplication is turned on, these files are checked to determine whether they have been processed. If so, the files are moved to the duplicate files folder.

Properties

Type: string

Default: ".*\.DAT$"

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any value that matches the .+ regular expression

Related parameters:

Other: ite.ingest.deduplication

ite.ingest.directoryScan.sleepTime

Specifies the time (in seconds) after each directory scan. This parameter optimizes the scan load. For example, there is no need to scan the input directories every second if new files arrive only once per hour.

Properties

Type: float

Default: 5.0

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any float value from 1 to 3600, inclusive

ite.ingest.directoryScan.sort

Specifies the sort mode for file name tuples.

If this parameter is set to off, sorting is disabled, in contrast to the spl.adapter::DirectoryScan operator, which always sorts by file time.

If this parameter is set to ascending, file name tuples are sorted in ascending order. The sort attribute must be provided in the ite.ingest.directoryScan.sort.attribute parameter. The sort window is one scan cycle of the directory scanner.

If the parameter is set to descending, file name tuples are sorted in descending order. The sort attribute must be provided in the ite.ingest.directoryScan.sort.attribute parameter. The sort window is one scan cycle of the directory scanner.

If this parameter is set to custom, you must provide the sort logic in the custom <namespace>.fileIngestion.custom::FileSort composite operator. You can provide the sort attribute in the ite.ingest.directoryScan.sort.attribute parameter or in the <namespace>.fileIngestion.custom::FileSort composite operator itself.

The input schema of the <namespace>.fileIngestion.custom::FileSort composite operator depends on the setting of the related ite.ingest.directoryScan.specialFileTime parameter.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: asc, ascending, custom, desc, descending, off

Related parameters:

Children: ite.ingest.directoryScan.sort.attribute
Other: ite.ingest.directoryScan.specialFileTime

ite.ingest.directoryScan.sort.attribute

Specifies the file-sort attribute. The file-sort attribute is used by the downstream sort operator. This parameter is an enumeration parameter with the following values:

off: No file-sort attribute is selected.
time: The file time is used as the sort attribute and depends on the ite.ingest.directoryScan.specialFileTime parameter.
name: The file name is used as the sort attribute.
size: The file size is used as the sort attribute.

If the parent parameter is set to ascending or descending, this parameter is mandatory. If the parent parameter is set to custom, it is optional. If the parent parameter is set to off, this parameter is forbidden.

If this parameter is required for the application and the related ite.ingest.directoryScan.specialFileTime parameter is turned on, this parameter must be set to time.

Properties

Type: enum

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: name, off, size, time

Related parameters:

Parent: ite.ingest.directoryScan.sort
Other: ite.ingest.directoryScan.specialFileTime

ite.ingest.directoryScan.specialFileTime

Enables a user-selected source for file time data. File time data is used in the file name deduplication logic to implement the eviction policy and to sort file name tuples.

The parameter is closely related to the ite.ingest.directoryScan.sort.attribute parameter.

If this parameter is set to off, the file time attribute is determined from modification time of the file object. If this parameter is set to on, the file time is determined from the file name. The file time generation is controlled by the ite.ingest.directoryScan.specialFileTime.regexp and and ite.ingest.directoryScan.specialFileTime.format parameters.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Children: ite.ingest.directoryScan.specialFileTime.format, ite.ingest.directoryScan.specialFileTime.regexp
Other: ite.ingest.directoryScan.sort.attribute

Details

If this parameter is set to on and file name sorting is used, the ite.ingest.directoryScan.sort.attribute parameter must be set to 'time'.

ite.ingest.directoryScan.specialFileTime.format

Provides a list of date and time formats for special file-time conversion.

Formats with a '_' separator accept any kind of separator.

Properties

Type: enum

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time

Valid values: a comma-separated list of the following values: DDMMYYYY, DDMMYYYYhhmmss, DD_MM_YYYY, DD_MM_YYYY_hh_mm_ss, DD_MM_YYYY_hh_mm_ss_mmm, MMDDYYYY, MMDDYYYYhhmmss, MM_DD_YYYY, MM_DD_YYYY_hh_mm_ss, MM_DD_YYYY_hh_mm_ss_mmm, YYYYMMDD, YYYYMMDDhhmmss, YYYY_MM_DD, YYYY_MM_DD_hh_mm_ss_mmm, YYY_MM_DD_hh_mm_ss

Related parameters:

Parent: ite.ingest.directoryScan.specialFileTime
Other: ite.ingest.directoryScan.specialFileTime.regexp

Details

If the ite.ingest.directoryScan.specialFileTime parent parameter is set to on, this parameter is mandatory. If not, this parameter is forbidden.

The cardinality of the parameter must match the cardinality of the ite.ingest.directoryScan.specialFileTime.regexp related parameter.

ite.ingest.directoryScan.specialFileTime.regexp

If the ite.ingest.directoryScan.specialFileTime parameter is set to on, this parameter is required. The values of this parameter are a list of regular expessions. The file name is tested against this regular expressions list. The first match is used and converted into a time, which overrides the file time attribute. The date and time format is used from the corresponding place in the format list that is defined in the ite.ingest.directoryScan.specialFileTime.format parameter.

Each regular expression must contain one group (pair of parentheses) that isolates the date and time from the rest of the file name. If no match is found with a particular file name, the file is considered invalid and moved to the invalid files directory.

Valid values are a comma-separated list of regular expressions that contain one pair of parentheses. A comma must not be part of a regular expression.

Example:

If a file name contains a date and time substring in the last 8 digits in front of the filename extension, for example cdr_cid1234_20120405.txt, the following regular expression can extract the date and time portion: .*_([0-9]{8}).txt$

The appropriate format parameter is: YYYYMMDD

Properties

Type: string

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time

Valid values: a comma-separated list of values that match the .+ regular expression

Related parameters:

Parent: ite.ingest.directoryScan.specialFileTime
Other: ite.ingest.directoryScan.specialFileTime.format

Details

If the ite.ingest.directoryScan.specialFileTime parent parameter is set to on, this parameter is mandatory. If not, this parameter is forbidden.

The cardinality of this parameter must match the cardinality of the ite.ingest.directoryScan.specialFileTime.format related parameter.

ite.ingest.fileGroupSplit

Enables the file ingestion group split.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Children: ite.ingest.fileGroupSplit.pattern

ite.ingest.fileGroupSplit.pattern

Defines a regular expression that extracts the group ID from the file name. The expression must have exactly one group (a pair of parentheses), which isolates the group ID from the rest of the file name. If the file name does not match the pattern, it is assigned to the default group. The group configuration is defined in the group configuration file that is specified in the ite.ingest.loadDistribution.groupConfigFile parameter.

If the ite.ingest.fileGroupSplit parameter is set to on, this parameter is required.

Properties

Type: string

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any value that matches the .+ regular expression

Related parameters:

Parent: ite.ingest.fileGroupSplit
Other: ite.ingest.loadDistribution.groupConfigFile

ite.ingest.loadDistribution

Selects the distribution method for the input files to the parallel processing chains.

Properties

Type: enum

Default: equalLoad

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: equalLoad, roundRobin

Related parameters:

Children: ite.ingest.loadDistribution.groupConfigFile, ite.ingest.loadDistribution.numberOfParallelChains, ite.ingest.loadDistribution.udp

ite.ingest.loadDistribution.groupConfigFile

Changes the name of the group configuration file. This parameter is obsolete in variants that do not use file groups.

Relative paths are relative to the data directory.

Properties

Type: string

Default: "./config/groups.cfg"

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any value that matches the .+ regular expression

Related parameters:

Parent: ite.ingest.loadDistribution
Other: ite.ingest.loadDistribution.numberOfParallelChains, ite.ingest.loadDistribution.udp

Details

The following example shows the expected file format (with added whitespaces for readability):

#Group identifier, Chains per group, Maximum BloomFilter entries
"default"        , 2               , 10000000
"2"              , 1               , 10000000
"3"              , 1               , 10000000

ite.ingest.loadDistribution.numberOfParallelChains

Defines the number of parallel processing chains for application variants that do not build groups based on file names.

Properties

Type: integer

Default: 3

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: any integer value that is equal to or greater than 1

Related parameters:

Parent: ite.ingest.loadDistribution
Other: ite.ingest.loadDistribution.groupConfigFile, ite.ingest.loadDistribution.udp

ite.ingest.loadDistribution.udp

Enables the user-defined parallelism feature.

If this parameter is set to on, the number of parallel chains can be increased at job submission time with one or more submission parameter depending on the used application variant. Otherwise, the number of chains is generated at compile time and cannot be changed at submission time.

If you are using variant A or B, use the ite.ingest.loadDistribution.groupConfigFile.chains parameter. If you are using variant C, use the ite.ingest.loadDistribution.groupConfigFile.chains.00 through till ite.ingest.loadDistribution.groupConfigFile.chains.99 parameters.

If the user-defined parallelism feature is used in custom code, this parameter must be turned off since nested parallel regions are not supported.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Parent: ite.ingest.loadDistribution
Other: ite.ingest.loadDistribution.groupConfigFile, ite.ingest.loadDistribution.numberOfParallelChains

ite.ingest.reader.compression

Enables the encoding parameter for the spl.adapter::FileSource operator in the specified composite operators. The default compression mode is gzip but can be changed in the <namespace>.chainprocessor.reader.custom::FileReaderCustom composite operator by setting the compression parameter for the used composite.

Enable this parameter only if your input files are compressed.

Properties

Type: enum

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time

Valid values: a comma-separated list of the following values: FileReaderASN1, FileReaderCSV, FileReaderStructure

ite.ingest.reader.customFileStatistics

Enables custom file statistics. To add attributes to the statistics schema, use TypesCustom::CustomFileStatisticsStreamType. If the ite.storage.type parameter is not set to 'tableFile', this parameter should be used.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

ite.ingest.reader.customParserStatistics

Enables custom parser statistics. Use TypesCustom::CustomParserStatisticsStreamType to define the parser statistic output stream type. It should be used to integrate your own parser.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

ite.ingest.reader.debug

Enables additional file outputs that are used to troubleshoot your ITE application. The files are located in the debug directory, which is a subdirectory of the configured data directory.

When you set this parameter to on, you receive information about the parsed files.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Details

When this parameter is enabled, the following files are created:

CHAIN_READER_FILES_IN_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for files that must be processed.
CHAIN_READER_FILES_ACK_IN_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for files that are processed and are committed to the file name deduplication logic. The chain can then receive and process a new file.
CHAIN_READER_FILES_APP_CTRL_IN_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for each start or stop command that is received by the chain control logic.
CHAIN_READER_REC_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for each valid data tuple that leaves the record validation, which is the <namespace>.chainprocessor.reader.custom::RecordValidator, or, if the ite.embeddedSampleCode parameter is turned on, the <namespace>.chainprocessor.reader::SampleRecordValidator composite operator.
CHAIN_READER_REJ_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for each rejected data tuple that leaves the record validation, which is the <namespace>.chainprocessor.reader.custom::RecordValidator, or, if the ite.embeddedSampleCode parameter is turned on, the <namespace>.chainprocessor.reader::SampleRecordValidator composite operator.
CHAIN_READER_STAT_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives statistics log entries for each completed file.
CHAIN_READER_STATUS_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for each status change of the chain that is initiated with a start or stop command.
CHAIN_READER_APP_CTRL_RESP_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for each start or stop response that leaves the chain control logic.
FILE_READER_OUT_<GROUP_ID>_<CHAIN_ID>.txt: Receives log entries for each data tuple that is sent by the FileReader composites that are specified in the ite.ingest.reader.parserList parameter.
FILE_READER_STAT_<GROUP_ID>_<CHAIN_ID>.txt: Receives statistics log entries for each completed file. The statistics are a subset of the statistics that are stored in the CHAIN_READER_STAT_OUT_<GROUP_ID>_<CHAIN_ID>.txt file. The FileReader generates these statistics.

ite.ingest.reader.encoding

Enables the encoding parameter for the spl.adapter::FileSource operator in the specified composite operators.

Properties

Type: enum

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time

Valid values: a comma-separated list of the following values: FileReaderCSV

ite.ingest.reader.parserList

Enables one or more parsers and specifies the file type ids for which the parsers are responsible.

If you disable the parameter ite.embeddedSampleCode to start your customizing work, you must immediately assign your parsers to this parameter.

Properties

Type: string

Default: "*|FileReaderCustom"

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time

Valid values: a comma-separated list of values that match the [^|]+\|[A-Z][\w_]* regular expression

ite.ingest.reader.preprocessing

Enables file preprocessing that is used to determine attribute values once per file or to determine the file type if the file type cannot be derived from the file name.

Implement your code in the <namespace>.chainprocessor.reader.custom::PreFileReader composite operator. To activate your code, set this parameter to on. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.chainprocessor.reader::SamplePreFileReader composite operator.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.embeddedSampleCode

ite.ingest.reader.schemaExtensionForLookup

If this parameter is set to on, the stream schema, which is the output of the parsing and the input to the data enrichment, is extended with the attributes that are specified in the <namespace>.streams.custom::TypesCustom.LookupType type.

These additional attributes are commonly used during the enrichment. In other words, the custom lookup code assigns the enrichment data to these attributes.

If you require additional attributes to assign your enrichment data, set this parameter to on and adapt the <namespace>.streams.custom::TypesCustom.LookupType type. To activate the customized type, you must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses the customized type instead of the sample <namespace>.streams::TypesCustom.SampleLookupType type.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.embeddedSampleCode

ite.jobName

Changes the job name of the ITE application at submission time. Per default the job name is the namespace you specified during creation of the ITE project. Each ITE job needs a unique job name to communicate with the Lookup Manager. Use this parameter to launch multiple ITE applications and assign unique names to each of them during submission time. You need to ensure that each job uses a different set of input and output directories, specify the directories by using the relevant submission time parameters.

You also need to tell the Lookup Manager application which ITE jobs it needs to control, by setting the submission time parameter lm.controlledApplications to contain the desired job names.

NOTE: If you use the multihost feature, the ITE jobs still share the same hosttag definitions. These definitions are created during compile time and cannot be overwritten using the ite.jobName parameter at submission time. The names of the generated hosttags are still derived from the original namespace of the application, you setup during project creation.

As a consequence, all jobs will run on the same set of hosts, with the same host placements. If this is not the desired behaviour, it is recommended to create multiple Streams instances with different sets of hosts and submit the jobs to different instances. Alternatively you can forgo the usage of the multihost feature (do not set global.multiHost=on) and let Streams decide the host placement on its own. In that case you need to ensure that all shared resources like filesystems and lookup segments are accessible by all hosts.

Properties

Type: string

Cardinality: 0..1

Application scope: ITE

Provisioning time: submission time

Valid values: any value that matches the (?:[a-z][a-z0-9_]*)(?:\.[a-z][a-z0-9_]*)* regular expression

Related parameters:

Other: global.multiHost, ite.checkpointing.directory, ite.ingest.directory.input, ite.storage.directory.outputs, ite.storage.directory.statistics, lm.controlledApplications

ite.resilienceOptimization

Enables the resilience for unexpected errors.

An unexpected error is, for example, a file that is deleted while being processed or a custom business logic that accesses data arrays out of bounds. For such problems, most SPL operators or functions raise exceptions and abort the processing element.

If resilience is enabled, the ITE application catches these unexpected errors and reports them in the rejected/<input-filename>.rej.csv rejection file. The rejection file is located in the output directory that is specified in the ite.storage.directory.outputs parameter. If resilience is disabled, errors lead to unhealthy processing elements (PEs) that stop tuple processing.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

ite.storage.auditOutputs

Enables an additional processing step for file statistics that you can use to, for example, write the statistics to a database or export the statistics to another application.

Implement your code in the <namespace>.chainsink.custom::AuditTableWriter composite operator. To activate your code, set this parameter to on. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.chainsink::SampleAuditTableWriter composite operator.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

ite.storage.directory.outputs

Specifies the base directory for output files. This base directory may contain load, rejected, and statistics subdirectories.

A relative path is relative to the data directory.

Properties

Type: string

Default: "./out"

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Related parameters:

Other: ite.storage.directory.statistics

ite.storage.directory.statistics

Specifies the base directory for the statistics log files. For each file that is processed by an ITE application, an entry is written to the statistics log file. Job statistics logs are written with the date as the first part of the file name.

An archive subdirectory is created by the application and on a date switch, log files are moved to this archive directory.

A relative path is relative to the data directory.

Properties

Type: string

Default: "./out/statistics"

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

ite.storage.outputDirectoryStructure

Specifies the structure of the output directories. Output files can reside in one directory, in different subdirectories (according to the input file that created the output files), or in subdirectories that contains all the files of one day.

Properties

Type: enum

Default: allInOne

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: allInOne, perDay, perFile

ite.storage.rejectWriter.custom

If set to on, you can implement your own handling for rejected records, for example to create alarms or write different files.

Implement your code in the <namespace>.chainsink.custom::RejectWriterCustom composite operator. To activate your code, set this parameter to on. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.chainsink::SampleRejectWriterCustom composite operator.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: off, on

Related parameters:

Other: ite.embeddedSampleCode

ite.storage.tableNames

Configures the table names that are used in the TableFileWriter. For each table name, a dedicated spl.adapter::FileSink operator is used. If the ite.storage.type parameter is set to 'tableFile', this parameter is mandatory.

Properties

Type: string

Cardinality: 0..n

Application scope: ITE

Provisioning time: compile time

Valid values: a comma-separated list of values that match the (?:[\w$]+\.)?[\w$]+ regular expression

Related parameters:

Other: ite.storage.type

ite.storage.type

Selects the output type for your application.

You can specify tableFile to write CSV files, which can be consumed by another application, for example, to load the content of these CSV files into a database. Chose this type if you want to create many output files.

You can specify recordFile to write an output file for each input file.

Or, you specify custom to implement your own file writer. Implement your code in the <namespace>.chainsink.custom::FileWriterCustom composite operator. To activate your code, set this parameter to custom. You must also set the ite.embeddedSampleCode parameter to off, so the ITE application uses your implementation instead of the sample logic that is provided with the <namespace>.chainsink::SampleFileWriterCustom composite operator.

If you specify the noFile option, the ITE application does not write output files for each input file. ITE applications that use, for example, variant B or C, can select this option if <namespace>.context.custom::ContextDataProcessor creates output files only. One use case for writing output files in <namespace>.context.custom::ContextDataProcessor only, is that you need to aggregate data across files and the <namespace>.context.custom::ContextDataProcessor triggers events.

Properties

Type: enum

Default: recordFile

Cardinality: 0..1

Application scope: ITE

Provisioning time: compile time

Valid values: custom, noFile, recordFile, tableFile

Related parameters:

Other: ite.storage.tableNames

lm.commandsDirectory

Specifies the directory that is scanned for command input files. Successfully processed command input files are moved to the archive subdirectory. Input files that could not be processed are moved to the failed subdirectory. If these subdirectories do not exist, they are created during the startup phase.

A relative path is relative to the data directory.

Properties

Type: string

Default: "./in/cmd"

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Details

The directory does not need to be in a shared file system because the Lookup Manager application always scans for new command input files on the host that has the <namespace>_lookup_writer host tag assigned.

lm.controlledApplications

Restricts the list of ITE applications that are controlled by the Lookup Manager application to a subset of the ITE applications that are defined in the LookupMgrCustomizing.xml file. The file is located in the Lookup Manager application directory.

Provide a comma-separated list of namespaces as defined in the LookupMgrCustomizing.xml file.

If the submission-time parameter is omitted, the Lookup Manager application controls all ITE applications that are defined in the LookupMgrCustomizing.xml file.

Properties

Type: string

Cardinality: 0..n

Application scope: Lookup Manager

Provisioning time: submission time

Valid values: a comma-separated list of values that match the (?:[a-z][a-z0-9_]*)(?:\.[a-z][a-z0-9_]*)* regular expression

lm.db

Specifies whether the Lookup Manager application reads enrichment data from a database source. The read enrichment data is distributed to the lookup repositories on all configured hosts.

If the Lookup Manager application reads enrichment data from a database source, set this parameter to on. If not, set it to off.

If you set this parameter to on, the child parameters must be configured according to their descriptions. If the parameter is turned off, the child parameters are inactive, and the related lm.file parameters must be turned on.

When you create a project, a connections.xml sample file is created in the application directory.

Properties

Type: enum

Default: off

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time

Valid values: off, on

Related parameters:

Children: lm.db.connectionName, lm.db.name, lm.db.password, lm.db.user, lm.db.vendor
Other: lm.file

Details

The Lookup Manager uses the com.ibm.streams.db::ODBCRun operator from the Database toolkit to read the enrichment data. All required Database toolkit settings must be provided.

lm.db.connectionName

Specifies the connection name that will be used to access the database source.

Use one of the names that is specified in the connections.xml file of the database toolkit. The XPath for these names is /connections/connection_specifications/connection_specification/@name.

The parameter is active only if the parent parameter is turned on.

Properties

Type: string

Default: "SAMPLE"

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time

Valid values: any value that matches the .+ regular expression

Related parameters:

Parent: lm.db
Other: lm.db.name, lm.db.password, lm.db.user, lm.db.vendor

Details

The specified connection name is passed as connection parameter to the com.ibm.streams.db::ODBCRun operator.

lm.db.name

Specifies the data source name (DSN) of the target database.

Important: If this parameter is provided as a compile time parameter, its value is visible in the SPL files that are compiled from the mixed-mode SPLMM files. To prevent security concerns, it is recommended that you provide all database access information as submission-time parameters only.

This parameter is active only if the parent parameter is turned on.

Properties

Type: string

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Related parameters:

Parent: lm.db
Other: lm.db.connectionName, lm.db.password, lm.db.user, lm.db.vendor

Details

This parameter is passed as a database parameter to the com.ibm.streams.db::ODBCRun operator.

Any value that is specified on the <ODBC> element of the <connection_specification> element in the connection.xml document is ignored.

For additional information, see the Database toolkit description.

lm.db.password

Specifies the password that is used to connect to the target database.

Important: If this parameter is provided as compile-time parameter, its value is visible in the SPL files that are compiled from the mixed-mode SPLMM files. To prevent security concerns, it is recommended that you provide all database access information as submission-time parameters only.

The parameter is active only if the parent parameter is turned on.

Properties

Type: string

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Related parameters:

Parent: lm.db
Other: lm.db.connectionName, lm.db.name, lm.db.user, lm.db.vendor

Details

This parameter is passed as a password parameter to the com.ibm.streams.db::ODBCRun operator.

Any value that is specified on the <ODBC> element of the <connection_specification> element in the connection.xml document is ignored.

For additional information, see the Database toolkit description.

lm.db.user

Specifies the user name that is used to connect to the target database.

Important: If this parameter is provided as a compile-time parameter, its value is visible in the SPL files that are compiled from the mixed-mode SPLMM files. To prevent security concerns, it is recommended that you provide all database access information as submission-time parameters only.

The parameter is active only if the parent parameter is turned on.

Properties

Type: string

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Related parameters:

Parent: lm.db
Other: lm.db.connectionName, lm.db.name, lm.db.password, lm.db.vendor

Details

This parameter is passed as a user parameter to the com.ibm.streams.db::ODBCRun operator.

Any value that is specified on the <ODBC> element of the <connection_specification> element in the connection.xml document is ignored.

For additional information, see the Database toolkit description.

lm.db.vendor

Specifies the database vendor (product). The Lookup Manager application supports only a subset of the database products (DB2® and Oracle) that are supported by the Database toolkit.

All required database toolkit settings and drivers for the selected product must be provided. For example, set the STREAMS_ADAPTERS_ODBC_DB2 environment variable for DB2 or STREAMS_ADAPTERS_ODBC_ORACLE for Oracle. All other environment variables that are required by the Database toolkit must also be set, for example, STREAMS_ADAPTERS_ODBC_INCPATH and STREAMS_ADAPTERS_ODBC_LIBPATH.

The parameter is active only if the parent parameter is turned on.

Properties

Type: enum

Default: DB2

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time

Valid values: DB2, ORACLE

Related parameters:

Parent: lm.db
Other: lm.db.connectionName, lm.db.name, lm.db.password, lm.db.user

Details

For additional information, see the Database toolkit description.

lm.file

Specifies whether the Lookup Manager application reads enrichment data from files. The read enrichment data is distributed to the lookup repositories on all configured hosts.

If the Lookup Manager application reads enrichment data from files, set this parameter to on. If not, set it to off.

If this parameter is turned on, the child parameters can be configured according to their descriptions. If the parameter is turned off, the child parameters are inactive and the related lm.db parameters must be turned on.

Properties

Type: enum

Default: on

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time

Valid values: off, on

Related parameters:

Children: lm.file.directory
Other: lm.db

lm.file.directory

Specifies the directory that holds enrichment data input files.

A relative path is relative to the data directory.

Enrichment data input files have either the .csv or .del.csv extension.

The parameter is active only if the parent parameter is turned on.

Properties

Type: string

Default: "."

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Related parameters:

Parent: lm.file

Details

The basename (the file name without the extension) is a segment name that is provided as part of an update or delete command. The provided segment name must match one of the segment names that are defined in the LookupMgrCustomizing.xml file. The file is located in the Lookup Manager application directory.

lm.statisticsDirectory

Specifies the directory that is used to store log and statistics files.

The Lookup Manager application collects statistics for the lookup repository, for example the amount of available memory, and generates log information, for example the starting and ending times of processed commands. The Lookup Manager application writes this information to a file, <date>_LookupManagerStatistics.txt with a YYYYMMDD date format.

The specified directory holds only one statistics log file. If a new file is created because the new day begins, the old file is moved to the archive directory. The archive directory is created during the startup phase.

A relative path is relative to the data directory.

Properties

Type: string

Default: "./out/statistics"

Cardinality: 0..1

Application scope: Lookup Manager

Provisioning time: compile time, submission time

Valid values: any value that matches the .+ regular expression

Details

The directory does not need to be in a shared file system because the Lookup Manager application always runs on the host that has the <namespace>_lookup_writer host tag assigned.