A collector defines how a sequential operator combines
the partitions of an input data set for processing by a single node.
A collection method is the algorithm used by a sequential operator
to combine the partitions of an input data set into a single input
stream. The method can be as simple as combining the partitions on
a first-come, first-served basis, where the sequential operator processes
records in the order in which they are received from the previous
operator. Alternatively, information contained within a record can
be used to determine the order of processing.
InfoSphere DataStage provides a number of different collection
methods, including:
- any
- This method reads records on a first-come, first-served basis.
Operators that use a collection method of any allow the operator user
to explicitly override the collection method to set their own method.
To set the collection method, the operator user assigns a collector
to a data set used as an input to the operator. InfoSphere DataStage
then collects the partitions of the data set accordingly.
- round robin
- This method reads a record from the first input partition, then
from the second partition, and so on. When the last processing node
in the system is reached, it starts over.
- ordered
- This method reads all records from the first partition, then all
records from the second partition, and so on. This collection method
preserves any sorted order in the input data set.
- sorted merge
- This method reads records in an order based on one or more fields
of the record. The fields used to define record order are called collecting
keys. You use the sortmerge collection operator to implement this
method.
- other
- You can define a custom collection method by deriving a class
from APT_Collector. Operators that use custom collectors have a collection
method of other.
By default, sequential operators use the collection method any.
The any collection method allows operator users to prefix the operator
with a collection operator to control the collection method. For example,
a user could insert the ordered collection operator in a step before
the derived operator.
To set an explicit collection method for the operator that cannot
be overridden, you must include a call to APT_Operator::setCollectionMethod()
within APT_Operator::describeOperator().
You can also define your own collection method for each operator
input. To do so, you derive a collector class from APT_Collector.