Compare stage

The Compare stage is a processing stage. This stage performs a column-by-column comparison of records in two presorted input data sets.

The Compare stage is a processing stage. It can have two input links and a single output link.

The Compare stage performs a column-by-column comparison of records in two presorted input data sets. You can restrict the comparison to specified key columns.

The Compare stage does not change the table definition, partitioning, or content of the records in either input data set. It transfers both data sets intact to a single output data set generated by the stage. The comparison results are also recorded in the output data set.

You can use runtime column propagation in this stage and allow InfoSphere® DataStage® to define the output column schema for you at runtime. The stage outputs a data set with three columns:

  • result. Carries the code giving the result of the comparison.
  • first. A subrecord containing the columns of the first input link.
  • second. A subrecord containing the columns of the second input link.
If you specify the output link metadata yourself, you must define the columns carrying the data as subrecords of a parent column that you also define. InfoSphere DataStage will not let you specify two groups of identical column names, and so you make them subrecords to give them unique names such as first.col1 and second.col1. Specify metadata by doing the following steps:
  1. Specify the parent column for the output data corresponding to the first input link, and set the SQL type to unknown.
  2. Specify the actual columns that carry your data and make these subrecords of the parent column. Name each column first.colname, for example first.col1, first.col2 and so on. Make each column a subrecord by selecting the column, selecting edit row from the shortcut menu, and specifying a level number (for example, 03) for that column. (You can speed up this process by making the first column a subrecord and using the propagate values feature to make the remaining columns subrecords of the parent column.)
  3. Specify the parent column for output data corresponding to the second input link, and set the SQL type to unknown.
  4. Specify the actual columns that carry the data from the second input link, name them second.colname (for example, second.col1, second.col2) and make these subrecords of the parent column.
Shows a Compare stage being used to compare two sequential files and output the results to a data set

The stage editor has three pages:

  • Stage Page. This is always present and is used to specify general information about the stage.
  • Input Page. This is where you specify the details about the single input set from which you are selecting records.
  • Output Page. This is where you specify details about the processed data being output from the stage.