Improving LOAD performance

Depending on the data, target object, and available resources, you can take certain actions that might improve the performance of the LOAD utility. For example, you can preprocess the input data or specify a particular LOAD option.

About this task

Start of changeBe aware that the performance of LOAD on a table that is organized by hash is likely to be slower. The reason is that the rows are loaded according to the hash key rather than sequentially on the pages.End of change

Procedure

To improve LOAD performance, complete one or more of the following recommended actions as appropriate:

  • Load numeric data in internal format. DB2 internal format is the format that is produced by running UNLOAD with the FORMATINTERNAL option.
  • If you specify LOAD REPLACE, specify LOG NO with COPYDDN or RECOVERYDDN to create an inline copy.
  • If you are using 3990 caching and have the secondary indexes on RAMAC, consider specifying YES for the SEQPRES subsystem parameter. This option allows DB2® to use sequential prestaging when it reads data from RAMAC for the following utilities:
    • LOAD PART integer RESUME
    • REORG TABLESPACE PART
    For these utilities, prefetch reads remain in the cache longer, which can possibly improve the performance of subsequent writes.
  • Preprocess input data: Present data to LOAD in the optimal order, as follows:
    • Sort the data in cluster order to avoid needing to reorganize it after loading.
    • If you are loading a single table that has, at most, one foreign key or one index key, sort the data in key sequence. (An index over a foreign key is allowed.) If the key is an index key, sort the data in either ascending or descending order, depending on how the index was defined. If the key is a foreign key, sort the data in ascending order. Null key values are treated as "high" values.
    • If you are loading more than one table, choose one of the following methods:
      • Load each table separately. If you use this method, you can follow the rules that are listed in the preceding bullet for loading single tables.
      • Use the WHEN clause under each INTO TABLE option on your LOAD statement to group your input data by table.
      Within each table, sort the data in key sequence.

    If you do presort the data, specify the PRESORTED YES option in the LOAD statement.

    Do any other preprocessing of the input data, as described in Before running LOAD.

  • Avoid conversions:
    • If possible, avoid data conversion, such as from integer to decimal or from decimal to floating-point.
    • Avoid CCSID and encoding scheme conversions, if possible, by loading data that has the same CCSID as the target table. If you specify a CCSID or encoding scheme option that does not match that of the table that is being loaded, CCSID conversions can occur.
  • Avoid LOB and XML materialization: To eliminate the need to load large LOBs or XML documents into virtual storage while LOAD is running, use file reference variables for the following data:
    • For XML data that is greater than 32 KB in size.
    • For large LOB data in a row with only 1 LOB. (Large LOBs are usually considered to be 2 MB or greater in size.)

    In these cases, LOAD does not materialize LOB or XML data into memory.

  • Use parallelism: If the table space is partitioned and one or more nonpartitioned secondary indexes exist, use partition parallelism to load all partitions in a single job. Partition parallelism can reduce the elapsed time for loading large amounts of data. To invoke partition parallelism, specify the INTO TABLE PART clause with an INDDN specification for each partition. This specification requires that you have a separate input data set for each partition.

    If the table space is partitioned and the only indexes are the partitioned indexes, you cannot enable partition parallelism. In this case, if possible, use multiple jobs to run LOAD concurrently against separate partitions. This method also requires that you have a separate input data set for each partition.

    If you enable parallelism, allocate extra resources as needed and tune your system to avoid significant performance degradation. Specifically, to benefit from parallel operations when you use LOAD SHRLEVEL CHANGE or parallel inserts, especially when secondary indexes are used, take the following actions:

    • Use a larger buffer pool to improve the buffer pool hit ratio.
    • Define a higher deferred-write threshold to reduce the number of pages that are written to disk, which reduces the I/O time and contention.
    • Define a larger checkpoint interval to reduce the number of pages that are written to disk, which reduces the I/O time and contention.
    • Use ESS Parallel Access Volume (PAV) to support multiple concurrent I/Os to the same volume that contains secondary index data sets.
    • Use secondary index pieces to support multiple concurrent secondary index I/Os.
  • Use SORTKEYS: The SORTKEYS option improves performance of the index key sort. With SORTKEYS, index keys are passed in memory rather than written to work files. Avoiding this I/O to the work files improves LOAD performance. LOAD with SORTKEYS also reduces disk space requirements for the SYSUT1 and SORTOUT data sets, especially if you provide an estimate of the number of keys to sort. The SORTKEYS option reduces the elapsed time from the start of the RELOAD phase to the end of the BUILD phase.

    The SORTKEYS option is the default if one of the following conditions is true: SHRLEVEL is not NONE or SHRLEVEL is NONE, and the target table has one or more indexes.

    With SORTKEYS, you can specify an estimate of the number of keys for the job to sort. If the estimate is omitted or specified as 0, LOAD writes the extracted keys to the work data set, which reduces the performance improvement of using SORTKEYS.

    To estimate the number of keys to sort:

    1. Count 1 for each index.
    2. Count 1 for each foreign key where the foreign key and index definitions are not identical.
    3. For each foreign key where the foreign key and index definitions are identical:
      1. Count 0 for the first relationship in which the foreign key participates.
      2. Count 1 for subsequent relationships in which the foreign key participates (if any).
    4. Multiply the count by the number of rows to be loaded.

    If more than one table is being loaded, repeat the preceding steps for each table, and sum the results.

  • Build the indexes in parallel: You can reduce the elapsed time of a LOAD job for a table space or partition with more than one defined index by having LOAD build the indexes in parallel. See Building indexes in parallel for LOAD.
  • Specify PREFORMAT: PREFORMAT eliminates the need for DB2 to preformat new pages in a table space during execution time.

    DB2 preformatting sometimes causes a delay, which can affect the performance or execution-time consistency of applications that do many inserts or LOAD jobs with SHRLEVEL CHANGE. When these delays occur and you can predict the table size for a business processing cycle, consider using LOAD PREFORMAT or REORG PREFORMAT. This technique is valuable only when DB2 preformatting causes a measurable delay with processing or causes inconsistent application elapsed times for INSERT operations or LOAD RESUME YES SHRLEVEL CHANGE jobs.

    Recommendation: Assess performance before and after you use LOAD PREFORMAT or REORG PREFORMAT to quantify its value in your environment.

    Using PREFORMAT might eliminate execution-time delays but adds setup time before the application's execution. The cost of this improvement is an increase in the LOAD or REORG time, because the utility must preformat all pages between the data that is loaded or reorganized and the high-allocated RBA. The additional LOAD or REORG time that is required depends on the amount of disk space that is being preformatted. When this preformatted space is used and DB2 needs to extend the table space, normal data set extending and preformatting occurs.

    Consider using preformatting for LOAD SHRLEVEL CHANGE or INSERT processing in the following situations:

    • For tables on which many inserts are done and that receive a predictable amount of data. In this case, all of the required space can be preallocated before the application's execution.
    • For a table that acts as a repository for work items that come into a system and that are later used for a backend task that processes the work items.
    • For table spaces that start out empty and are populated by many inserts before any query access is run against the table space.

    LOAD PREFORMAT or REORG PREFORMAT is not recommended for tables that have a high ratio of reads to inserts if the reads result in table space scans. In this case, preformatting of a table space that contains a table that is used for query processing can cause table space scans to read extra empty pages. This extra reading can extend the elapsed time for these queries.

    Mixing inserts and nonindexed queries against a preformatted table space might have a negative impact on the query performance without providing a compensating improvement in the insert performance. Typically, PREFORMAT yields the best results where a high ratio of inserts to read operations exists.

    Also, consider the following implications of PREFORMAT on your data sets:

    • For user-managed data sets, DB2 does not delete and reallocate them during utility processing. The size of the data set does not shrink back to the original data set allocation size but either remains the same or increases in size if more space or data is added. This characteristic has implications when LOAD or REORG PREFORMAT is used because of the preformatting that is done for all free pages between the high-used RBA (or page) to the high-allocated RBA. This preformatting includes secondary extents that have been allocated.
    • For DB2-managed data sets, DB2 deletes and reallocates them if you specify REPLACE on the LOAD or REORG job. This behavior results in the data sets being resized to their original allocation size. They remain that size if the data that is being reloaded does not fill the primary allocation and force a secondary allocation. Therefore, LOAD PREFORMAT or REORG PREFORMAT with DB2-managed data causes at least the full primary allocation amount of a data set to be preformatted after the data is reloaded into the table space.
    • For both user-managed and DB2-managed data sets, if the data set goes into secondary extents during utility processing, the high-allocated RBA becomes the end of the secondary extent. That value becomes the high value for preformatting.