Sorting data and database targets

Jobs often process or create sorted data. Many of these semantics of these sort operations cannot be directly reflected in SQL pushed to a database target.

In an UPDATE statement, if two or more data rows can affect a single row in a database table in a non-commutative way (for example, setting a salary field to two different values), the order in which the data rows are applied to the target table is important. In an INSERT statement, if two or more data rows with the same primary key attempt to INSERT rows into the target table, only the first succeeds.

The InfoSphere® Balanced Optimization tool attempts to preserve correct semantics with respect to sorted data. In many cases InfoSphere DataStage® Balanced Optimization can determine with certainty from the root job design whether the order of application of data rows to database tables is important, and perform or avoid certain optimizations.

If you know that there is a sort order dependency in your processing, or you think that there might be, turn off the Push processing to database targets option. This guarantees correct target database writing semantics.