IBM InfoSphere DataStage and InfoSphere QualityStage, Version 8.5

Combine Records stage

The Combine Records stage is restructure stage. It can have a single input link and a single output link.

The Combine Records stage combines records (that is, rows), in which particular key-column values are identical, into vectors of subrecords. As input, the stage takes a data set in which one or more columns are chosen as keys. All adjacent records whose key columns contain the same value are gathered into the same record as subrecords.

Shows columns being combined into a vector of subrecords

The data set input to the Combine Records stage must be key partitioned and sorted. This ensures that rows with the same key column values are located in the same partition and will be processed by the same node. Choosing the (auto) partitioning method will ensure that partitioning and sorting is done. If sorting and partitioning are carried out on separate stages before the Combine Records stage, InfoSphere® DataStage® in auto mode will detect this and not repartition (alternatively you could explicitly specify the Same partitioning method).

Shows a Combine Records stage with a single input link and a single output link

The stage editor has three pages:

Stage Page. This is always present and is used to specify general information about the stage.
Input Page. This is where you specify the details about the single input set from which you are selecting records.
Output Page. This is where you specify details about the processed data being output from the stage.

This topic is also in the IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide.

Update timestamp

Last updated: 2012-10-8