Information icon IBM InfoSphere DataStage and InfoSphere QualityStage, Version 8.5
space Feedback

Change Capture stage

The Change Capture Stage is a processing stage. The stage compares two data sets and makes a record of the differences.

The Change Capture stage takes two input data sets, denoted before and after, and outputs a single data set whose records represent the changes made to the before data set to obtain the after data set. The stage produces a change data set, whose table definition is transferred from the after data set's table definition with the addition of one column: a change code with values encoding the four actions: insert, delete, copy, and edit. The preserve-partitioning flag is set on the change data set.

The compare is based on a set a set of key columns, rows from the two data sets are assumed to be copies of one another if they have the same values in these key columns. You can also optionally specify change values. If two rows have identical key columns, you can compare the value columns in the rows to see if one is an edited copy of the other.

The stage assumes that the incoming data is key-partitioned and sorted in ascending order. The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built-in sorting and partitioning abilities of the Change Capture stage.

You can use the companion Change Apply stage to combine the changes from the Change Capture stage with the original before data set to reproduce the after data set (see Switch stage).

Shows a job where a before data set and an after data set are sorted, then passed to the Change Capture stage

The Change Capture stage is very similar to the Difference stage described in Difference stage.

The stage editor has three pages:


PDFThis topic is also in the IBM InfoSphere DataStage and QualityStage Parallel Job Developer's Guide.

Update timestamp Last updated: 2012-10-8