Building indexes in parallel for LOAD

Parallel index build reduces the elapsed time for a LOAD job by sorting the index keys and rebuilding multiple indexes in parallel, rather than sequentially. Optimally, a pair of subtasks process each index; one subtask sorts extracted keys while the other subtask builds the index.

LOAD begins building each index as soon as the corresponding sort produces its first sorted record.

LOAD uses parallel index build if all of the following conditions are true:

  • More than one index needs to be built.
  • The LOAD utility statement specifies a non-zero estimate of the number of keys on the SORTKEYS option.

You can either allow the utility to dynamically allocate the data sets that the SORT phase needs, or provide the necessary data sets yourself. Select one of the following methods to allocate sort work and message data sets:

Method 1: LOAD determines the optimal number of sort work and message data sets.

  1. Specify the SORTDEVT keyword in the utility statement.
  2. Allow dynamic allocation of sort work data sets by not supplying SORTWKnn DD statements in the LOAD utility JCL.
  3. Allocate UTPRINT to SYSOUT.

Method 2: You control allocation of sort work data sets, while LOAD allocates message data sets.

  1. Provide DD statements with DD names in the form SWnnWKmm.
  2. Allocate UTPRINT to SYSOUT.
Method 3: You have the most control over rebuild processing; you must specify both sort work and message data sets.
  1. Provide DD statements with DD names in the form SWnnWKmm.
  2. Provide DD statements with DD names in the form UTPRINnn.

Using this method does not eliminate the requirement for a UTPRINT DD card.

Data sets used

If you select Method 2 or 3 in the preceding information, use the following information to define the necessary data sets.

Each sort subtask must have its own group of sort work data sets and its own print message data set. Possible reasons to allocate data sets in the utility job JCL rather than using dynamic allocation are:

  • To control the size and placement of the data sets
  • To minimize device contention
  • To optimally use free disk space
  • To limit the number of utility subtasks that are used to build indexes

The DD names SWnnWKmm define the sort work data sets that are used during utility processing. nn identifies the subtask pair, and mm identifies one or more data sets that are to be used by that subtask pair. For example:

SW01WK01
The first sort work data set that is used by the subtask as it builds the first index.
SW01WK02
The second sort work data set that is used by the subtask as it builds the first index.
SW02WK01
The first sort work data set that is used by the subtask as it builds the second index.
SW02WK02
The second sort work data set that is used by the subtask as it builds the second index.

The DD names UTPRINnn define the sort work message data sets that are used by the utility subtask pairs. nn identifies the subtask pair.

Start of changeEvery time you invoke LOAD, new UTPRINnn data sets are dynamically allocated. LOAD does not reuse UTPRINnn data sets from previous job steps. This behavior might cause the available JES2 job queue elements to be consumed more quickly than expected.End of change

Determining the number of sort subtasks

The maximum number of utility subtask pairs that are started for parallel index build is equal to the number of indexes that are to be built.

LOAD determines the number of subtask pairs according to the following guidelines:

  • The number of subtask pairs equals the number of sort work data set groups that are allocated.
  • The number of subtask pairs equals the number of message data sets that are allocated.
  • If you allocate both sort work and message data set groups, the number of subtask pairs equals the smallest number of data sets that are allocated.

Allocation of sort subtasks

The LOAD utility attempts to assign one sort subtask pair for each index that is to be built. If the LOAD utility cannot start enough subtasks to build one index per subtask pair, it allocates any excess indexes across the pairs (in the order that the indexes were created), so that one or more subtask pairs might build more than one index.

During parallel index build processing, LOAD assigns all foreign keys to the first utility subtask pair. Remaining indexes are then distributed among the remaining subtask pairs according to the creation date of the index. If a table space does not participate in any relationships, LOAD distributes all indexes among the subtask pairs according to the index creation date, assigning the first created index to the first subtask pair.

Refer to the following table for conceptual information about subtask pairing when the number of indexes (seven indexes) exceeds the available number of subtask pairs (five subtask pairs).
Table 1. LOAD subtask pairing for a relational table space
Subtask pair Assigned index
SW01WKmm Foreign keys, fifth created index
SW02WKmm First created index, sixth created index
SW03WKmm Second created index, seventh created index
SW04WKmm Third created index
SW05WKmm Fourth created index

Estimating the sort work file size

If you choose to provide the data sets, you need to know the size and number of keys in all of the indexes that are being processed by the subtask in order to calculate each sort work file size. After you determine which indexes are assigned to which subtask pairs, use one of the following formulas to calculate the required space:

  • If the indexes being processed include a mixture of data-partitioned secondary indexes and nonpartitioned indexes, use the following formula: 2 * (longest index key + 15) * (number of extracted keys)
  • Otherwise, if only one type of index is being built, use the following formula: 2 * (longest index key + 13) * (number of extracted keys)
longest index key
The length of the longest key that is to be processed by the subtask. For the first subtask pair for LOAD, compare the length of the longest key and the length of the longest foreign key, and use the larger value. For nonpadded indexes, longest index key means the maximum possible length of a key with all varying-length columns, padded to their maximum lengths, plus 2 bytes for each varying-length column.
number of extracted keys
The number of keys from all indexes that are to be sorted and that the subtask is to process.