DB2 Version 9.7 for Linux, UNIX, and Windows

Options for improving load performance

There are various command parameters that you can use to optimize load performance. There are also a number of file type modifiers unique to load which can, in some cases, significantly improve that utility's performance.

Command parameters

The load utility attempts to deliver the best performance possible by determining optimal values for DISK_PARALLELISM, CPU_PARALLELISM, and DATA BUFFER, if these parameters have not be specified by the user. Optimization is done based on the size and the free space available in the utility heap. Consider using the autonomic DISK_PARALLELISM and CPU_PARALLELISM settings before attempting to tune these parameters for your particular needs.

Following is information about the performance implications of various options available through the load utility:

ALLOW READ ACCESS
This option allows you to query a table while a load operation is in progress. You can only view data that existed in the table prior to the load operation. If the INDEXING MODE INCREMENTAL option is also specified, and the load operation fails, the subsequent load terminate operation might have to correct inconsistencies in the index. This requires an index scan which involves considerable I/O. If the ALLOW READ ACCESS option is also specified for the load terminate operation, the buffer pool is used for I/O.
COPY YES or NO
Use this parameter to specify whether a copy of the input data is to be made during a load operation. COPY YES, which is only applicable when forward recovery is enabled, reduces load performance because all of the loading data is copied during the load operation. The increased I/O activity might increase the load time on an I/O-bound system. Specifying multiple devices or directories (on different disks) can offset some of the performance penalty resulting from this operation. COPY NO, which is only applicable when forward recovery is enabled, does not affect load performance. However, all table spaces related to the loaded table will be placed in a Backup Pending state, and those table spaces must be backed up before the table can be accessed.
CPU_PARALLELISM
Use this parameter to exploit the number of processes running per database partition (if this is part of your machine's capability), and significantly improve load performance. The parameter specifies the number of processes or threads used by the load utility to parse, convert, and format data records. The maximum number allowed is 30. If there is insufficient memory to support the specified value, the utility adjusts the value. If this parameter is not specified, the load utility selects a default value that is based on the number of CPUs on the system.
Record order in the source data is preserved (see Figure 1) regardless of the value of this parameter, provided that:
  • the anyorder file type modifier is not specified
  • the PARTITIONING_DBPARTNUMS option (and more than one partition is to be used for partitioning) is not specified

If tables include either LOB or LONG VARCHAR data, CPU_PARALLELISM is set to 1. Parallelism is not supported in this case.

Although use of this parameter is not restricted to symmetric multiprocessor (SMP) hardware, you might not obtain any discernible performance benefit from using it in non-SMP environments.

Figure 1. Record Order in the Source Data is Preserved When the Number of Processes Running Per Database Partition is Exploited During a Load Operation
Record order in source data is preserved when the number of processes running per database partition is exploited.
DATA BUFFER
The DATA BUFFER parameter specifies the total amount of memory, in 4 KB units, allocated to the load utility as a buffer. It is recommended that this buffer be several extents in size. The data buffer is allocated from the utility heap; however, the data buffer can exceed the setting for the util_heap_sz database configuration parameter as long as there is available memory in the system.
DISK_PARALLELISM
The DISK_PARALLELISM parameter specifies the number of processes or threads used by the load utility to write data records to disk. Use this parameter to exploit available containers when loading data, and significantly improve load performance. The maximum number allowed is the greater of four times the CPU_PARALLELISM value (actually used by the load utility), or 50. By default, DISK_PARALLELISM is equal to the sum of the table space containers on all table spaces containing objects for the table being loaded, except where this value exceeds the maximum number allowed.
NONRECOVERABLE
If forward recovery is enabled, use this parameter if you do not need to be able to recover load transactions against a table upon rollforward. A NONRECOVERABLE load and a COPY NO load have identical performance. However, there is a significant difference in terms of potential data loss. A NONRECOVERABLE load marks a table as not rollforward recoverable while leaving the table fully accessible. This can create a problematic situation in which if you need to rollforward through the load operation, then the loaded data as well as all subsequent updates to the table will be lost. A COPY NO load places all dependent table spaces in the Backup Pending state which renders the table inaccessible until a backup is performed. Because you are forced to take a backup after that type of load, you will not risk losing the loaded data or subsequent updates to the table. That is to say, a COPY NO load is totally recoverable.
Note: When these load transactions are encountered during subsequent restore and rollforward recovery operations, the table is not updated, and is marked invalid. Further actions against this table are ignored. After the rollforward operation is complete, the table can only be dropped.
SAVECOUNT
Use this parameter to set an interval for the establishment of consistency points during the load phase of a load operation. The synchronization of activities performed to establish a consistency point takes time. If done too frequently, there is a noticeable reduction in load performance. If a very large number of rows is to be loaded, it is recommended that a large SAVECOUNT value be specified (for example, a value of 10 million in the case of a load operation involving 100 million records).

A load restart operation automatically continues from the last consistency point, provided that the load restart operation resumes from the load phase.

STATISTICS USE PROFILE
Collect statistics specified in table statistics profile. Use this parameter to collect data distribution and index statistics more efficiently than through invocation of the RUNSTATS utility following completion of the load operation, even though performance of the load operation itself decreases (particularly when DETAILED INDEXES ALL is specified).

For optimal performance, applications require the best data distribution and index statistics possible. Once the statistics are updated, applications can use new access paths to the table data based on the latest statistics. New access paths to a table can be created by rebinding the application packages using the BIND command. The table statistics profile is created by running the RUNSTATS command with the SET PROFILE options.

When loading data into large tables, it is recommended that a larger value for the stat_heap_sz (statistics heap size) database configuration parameter be specified.

USE <tablespace-name>
When an ALLOW READ ACCESS load is taking place and the indexing mode is REBUILD, this parameter allows an index to be rebuilt in a system temporary table space and copied back to the index table space during the index copy phase of a load operation.

By default, the fully rebuilt index (also known as the shadow index) is built in the same table space as the original index. This might cause resource problems as both the original and the shadow index reside in the same table space simultaneously. If the shadow index is built in the same table space as the original index, the original index is instantaneously replaced by the shadow. However, if the shadow index is built in a system temporary table space, the load operation requires an index copy phase which copies the index from a system temporary table space to the index table space. There is considerable I/O involved in the copy. If either of the table spaces is a DMS table space, the I/O on the system temporary table space might not be sequential. The values specified by the DISK_PARALLELISM option are respected during the index copy phase.

WARNINGCOUNT
Use this parameter to specify the number of warnings that can be returned by the utility before a load operation is forced to terminate. Set the WARNINGCOUNT parameter to a relatively low number if you are expecting only a few or no warnings. The load operation stops after the WARNINGCOUNT number is reached. This gives you the opportunity to correct problems before attempting to complete the load operation.

File type modifiers

ANYORDER
By default, the load utility preserves record order of source data. When load is operating under an SMP environment, synchronization between parallel processing is required to ensure that order is preserved.

In an SMP environment, specifying the anyorder file type modifier instructs the load utility to not preserve the order, which improves efficiency by avoiding the synchronization necessary to preserve that order. However, if the data to be loaded is presorted, anyorder might corrupt the presorted order, and the benefits of presorting are lost for subsequent queries.
Note: The anyorder file type modifier has no effect if CPU_PARALLELISM is 1, and it is not compatible with the SAVECOUNT option.

BINARYNUMERICS, ZONEDDECIMAL and PACKEDDECIMAL
For fixed length non-delimited ASCII (ASC) source data, representing numeric data in binary can result in improved performance when loading. If the packeddecimal file type modifier is specified, decimal data is interpreted by the load utility to be in packed decimal format (two digits per byte). If the zoneddecimal file type modifier is specified, decimal data is interpreted by the load utility to be in zoned decimal format (one digit per byte). For all other numeric types, if the binarynumerics file type modifier is specified, data is interpreted by the load utility to be in binary format.

Note:
  • When the binarynumerics, packeddecimal, or zoneddecimal file type modifiers are specified, numeric data is interpreted in big-endian (high byte first) format, regardless of platform.
  • The packeddecimal and zoneddecimal file type modifiers are mutually exclusive.
  • The packeddecimal and zoneddecimal file type modifiers only apply to the decimal target columns, and the binary data must match the target column definitions.
  • The reclen file type modifier must be specified when the binarynumerics, packeddecimal, or zoneddecimal file type modifiers are specified.

FASTPARSE
Use with caution. In situations where the data being loaded is known to be valid, it can be unnecessary to have load perform the same amount of syntax checking as with more suspect data. In fact, decreasing the scope of this step can improve load's performance by about 10 or 20 percent. This can be done by using the fastparse file type modifier, which reduces the data checking that is performed on user-supplied column values from ASC and DEL files.

NOROWWARNINGS
During a load operation, warning messages about rejected rows are written to a specified file. However, if the load utility has to process a large volume of rejected, invalid or truncated records, it can adversely affect load's performance. In cases where many warnings are anticipated, it is useful to use the norowwarnings file type modifier to suppress the recording of these warnings.

PAGEFREESPACE, INDEXFREESPACE, and TOTALFREESPACE
As data is inserted and updated in tables over time, the need for table and index reorganization grows. One solution is to increase the amount of free space for tables and indexes using pagefreespace, indexfreespace, and totalfreespace. The first two modifiers, which take precedence over the PCTFREE value, specify the percentage of data and index pages that is to be left as free space, while totalfreespace specifies the percentage of the total number of pages that is to be appended to the table as free space.