Specifying a collection method

A collector defines how a sequential operator combines the partitions of an input data set for processing by a single node.

A collection method is the algorithm used by a sequential operator to combine the partitions of an input data set into a single input stream. The method can be as simple as combining the partitions on a first-come, first-served basis, where the sequential operator processes records in the order in which they are received from the previous operator. Alternatively, information contained within a record can be used to determine the order of processing.

InfoSphere DataStage provides a number of different collection methods, including:
any
This method reads records on a first-come, first-served basis. Operators that use a collection method of any allow the operator user to explicitly override the collection method to set their own method. To set the collection method, the operator user assigns a collector to a data set used as an input to the operator. InfoSphere DataStage then collects the partitions of the data set accordingly.
round robin
This method reads a record from the first input partition, then from the second partition, and so on. When the last processing node in the system is reached, it starts over.
ordered
This method reads all records from the first partition, then all records from the second partition, and so on. This collection method preserves any sorted order in the input data set.
sorted merge
This method reads records in an order based on one or more fields of the record. The fields used to define record order are called collecting keys. You use the sortmerge collection operator to implement this method.
other
You can define a custom collection method by deriving a class from APT_Collector. Operators that use custom collectors have a collection method of other.

By default, sequential operators use the collection method any. The any collection method allows operator users to prefix the operator with a collection operator to control the collection method. For example, a user could insert the ordered collection operator in a step before the derived operator.

To set an explicit collection method for the operator that cannot be overridden, you must include a call to APT_Operator::setCollectionMethod() within APT_Operator::describeOperator().

You can also define your own collection method for each operator input. To do so, you derive a collector class from APT_Collector.