Data sets that REORG TABLESPACE uses

The REORG TABLESPACE utility uses a number of data sets during its operation.

The following table describes the data sets that REORG TABLESPACE uses. The table lists the DD name that is used to identify the data set, a description of the data set, and an indication of whether it is required. Include statements in your JCL for each required data set, and any optional data sets that you want to use.

Table 1. Data sets that REORG TABLESPACE uses
Data set	Description	Required?
SYSIN	Input data set that contains the utility control statement.	Yes
SYSUT1	A temporary data set for sort input.	No
SYSPRINT	Output data set for messages.	Yes
STPRIN01	A data set that contains messages from the sort program (usually, SYSOUT or DUMMY). This data set is used when statistics are collected on at least one data-partitioned secondary index.	No¹
SYSDISC	Data set that contains discarded records from REORG DISCARD. The default DD name is SYSDISC.	No³
SYSPUNCH	Data set that contains a LOAD statement that is generated by REORG, which loads records that REORG DISCARD or REORG UNLOAD EXTERNAL wrote to the DISCARD or UNLOAD data sets. The default DD name is SYSPUNCH.	No⁴
UTPRINT	A data set that contains messages from the sort program (usually, SYSOUT or DUMMY).	Yes
Unload data set	Data set that contains the unloaded data that is to be reloaded during the RELOAD phase. Specify its DD or template name with the UNLDDN option or with the RECDSN field on the DB2I Utilities panel. The data set must be a sequential data set that is readable by BSAM. The default DD name is SYSREC. The unload data set must be large enough to contain all the unloaded records from all the tables in the target table space. If at least one table in the table space does not have an index, REORG cannot use the SORTDATA method with SHRLEVEL CHANGE. As a result, you must unload the data in the SYSREC data set.	Yes⁵
Sequential copies	From one to four output data sets that are to contain the image copies. Specify their DD or template names with the COPYDDN and RECOVERYDDN options of the utility control statement.	No⁶
FlashCopy® image copies	For table space or index space level copies, a VSAM data set for the output FlashCopy image copy of each partition or piece. For a partition level or piece level copy, a VSAM data set for the output FlashCopy image copy of the partition or piece.	No¹²
Work data sets	Temporary data sets for sort input and output. The DD names have the form DATAWKnn.	No⁷
Work data sets	Temporary data sets for sort input and output when sorting keys, or for sorting data when SORTDATA is specified but NOSYSREC is not. If index build parallelism is used, the DD names have the form SWnnWKmm. If index build parallelism is not used, the DD names have the form SORTWKnn	Yes⁸
Sort work data sets	Temporary data sets for sort input and output when collecting inline statistics on at least one data-partitioned secondary index. The DD names have the form ST01WKnn.	No^2,9
Sort work data sets	Temporary data sets for unload parallelism. The DD names have the form DAnnWKmm.	Yes¹⁰
Print data sets	Data sets for unload parallelism. The DD names have the form DTPRINnn. Every time you invoke REORG TABLESPACE, new DTPRIN`nn` data sets are dynamically allocated. REORG TABLESPACE does not reuse DTPRIN`nn` data sets from previous job steps. This behavior might cause the available JES2 job queue elements to be consumed more quickly than expected.	Yes^10,11
Note: STPRIN01 is required if statistics are being collected on at least one data-partitioned secondary index, but REORG TABLESPACE dynamically allocates the STPRIN01 data set if UTPRINT is allocated to SYSOUT. Required when collecting inline statistics on at least one data-partitioned secondary index. Required if you specify DISCARDDN Required you specify PUNCHDDN Required unless NOSYSREC or SHRLEVEL CHANGE is specified. Required if a partition is in REORG-pending status or REBALANCE, COPYDDN, RECOVERYDDN, SHRLEVEL REFERENCE, or SHRLEVEL CHANGE is specified. Required if NOSYSREC or SHRLEVEL CHANGE is specified, but SORTDEVT is not specified. Required if any indexes exist and SORTDEVT is not specified. If the DYNALLOC parm of the SORT program is not turned on, you need to allocate the data set. Otherwise, the sort program dynamically allocates the temporary data set. If you specify the SORTDEVT keyword, the data sets are dynamically allocated. It is recommended that you use dynamic allocation by specifying SORTDEVT in the utility statement because dynamic allocation reduces the maintenance required of the utility job JCL. If UTPRINT is allocated to SYSOUT, the data sets are dynamically allocated. Required if you specify either FLASHCOPY YES or FLASHCOPY CONSISTENT.

The following objects are named in the utility control statement and do not require DD statements in the JCL:

Table space: Object that is to be reorganized.

Calculating the size of the unload data set

The required size for the unload data set varies depending on the options that you use for REORG.

If you use REORG with UNLOAD PAUSE or CONTINUE and you specify KEEPDICTIONARY (assuming that a compression dictionary already exists), the size of the unload data set, in bytes, is the VSAM high-allocated RBA for the table space. You can obtain the high-allocated RBA from the associated VSAM catalog.
For SHRLEVEL CHANGE, also add the result of the following calculation (in bytes) to the VSAM high-used RBA:
```
number of records * 11
```
If you use REORG with UNLOAD ONLY, UNLOAD PAUSE, or CONTINUE and you do not specify KEEPDICTIONARY, you can calculate the size of the unload data set, in bytes, by using the following formula:
```
maximum row length * number of rows
```
The maximum row length is the row length, including the 6-byte record prefix, plus the length of the longest clustering key. If multiple tables exist in the table space, use the following formula to determine the maximum row length:
```
Sum over all tables ((row length + (2 * number of VARBIN 
    columns)) * number of rows)
```
For SHRLEVEL CHANGE, also add the result of the following formula to the preceding result:
```
(21 * ((NEARINDREF + FARINDREF) * 1.1))
```
In the preceding formula:
NEARINDREF

Is the value that is obtained from the NEARINDREF column of the SYSIBM.SYSTABLEPART catalog table. The accuracy of the data set size calculation depends on recent information in the SYSTABLEPART catalog table.

FARINDREF

Is the value that is obtained from the FARINDREF column of the SYSIBM.SYSTABLEPART catalog table.
If you have variable-length fields, the calculation in step 2 might result in excessive space. Use the average uncompressed row length, multiplied by the number of rows.
If you use REORG with UNLOAD PAUSE or CONTINUE with the DISCARD option, and the table has variable length fields, use the maximum row length in the calculation. The DISCARD option without the NOPAD option pads the variable length fields.

For certain table spaces in the catalog and directory, the unload data set for the table spaces have a different format. The calculation for the size of this data set is as follows:

data set size in bytes = (28 + longrow) * numrows

In the preceding formula:

longrow: Is the length of the longest row in the table space.
numrows: Is the number of rows in the data set.

The length of the row is calculated as follows:

Sum of column lengths + 4 bytes for each link

The length of the column is calculated as follows:

Maximum length of the column + 1 (if nullable) + 2 (if varying length)

Calculating the size of the sort work data sets

Start of change Using two or three large SORTWKnn data sets is preferable to using several small ones. If adequate space is not available, you cannot run REORG. End of change

Sort work data sets cannot span volumes. Smaller volumes require more sort work data sets to sort the same amount of data; therefore, large volume sizes can reduce the number of sort work data sets that are needed.

Start of change When you allocate sort work data sets on disk, allocate at least 1.2 times the amount of space that is used by the data that is to be sorted. End of change

Start of change Allocating twice the space that is used by the unload data sets is usually adequate for the sort work data sets. For compressed data, if the data needs to be uncompressed for processing, allocate additional sort space. Use the compression ratio of the existing data to calculate the additional sort space that is required. Some examples of when the compressed data needs to be uncompressed include, but are not limited to: End of change

UNLOAD PAUSE without KEEPDICTIONARY
UNLOAD CONTINUE without KEEPDICTIONARY
Discard processing
After schema changes, such as ALTER ADD COLUMN
When AUX YES processing is in effect

Start of change Tape devices are not supported for sort work data sets. End of change

Specifying a destination for sort program messages

The REORG utility job step must contain a UTPRINT DD statement that defines a destination for messages that are issued by the sort program during the SORT phase of REORG. DB2I, the %DSNU CLIST command, and the DSNUPROC procedure use the following default DD statement:

//UTPRINT DD  SYSOUT=A

Calculating the size of the statistics sort work data sets:

To calculate the approximate size (in bytes) of the ST01WKnn data set, use the following formula:

2 ×(maximum record length × numcols × (count + 2) × number of indexes)

The variables in the preceding formula have the following values:

maximum record length: Maximum record length of the SYSCOLDISTSTATS record that is processed when collecting frequency statistics (You can obtain this value from the RECLENGTH column in SYSTABLES.)
numcols: Number of key columns to concatenate when you collect frequent values from the specified index.
count: Number of frequent values that DB2® is to collect.