Sorting data

Use the Sort stage to sort columns on the input link to suit your requirements. Sort specifications are defined on the Sort By tab.

You will probably have requirements in your parallel jobs to sort data. InfoSphere® DataStage® has a sort stage, which allows you to perform complex sorting operations. There are situations, however, where you require a fairly simple sort as a precursor to a processing operation. For these purposes, InfoSphere DataStage allows you to insert a sort operation in most stage types for incoming data. You do this by selecting the Sorting option on the Input page Partitioning tab . When you do this you can specify:
  • Sorting keys. The field(s) on which data is sorted. You must specify a primary key, but you can also specify any number of secondary keys. The first key you define is taken as the primary.
  • Stable sort (this is the default and specifies that previously sorted data sets are preserved).
  • Unique sort (discards records if multiple records have identical sorting key values).
  • Case sensitivity.
  • Sort direction. Sorted as EBCDIC (ASCII is the default).

If you have NLS enabled, you can also specify the collate convention used.

Some InfoSphere DataStage operations require that the data they process is sorted (for example, the Merge operation). If InfoSphere DataStage detects that the input data set is not sorted in such a case, it will automatically insert a sort operation in order to enable the processing to take place unless you have explicitly specified otherwise.