The load utility is capable of efficiently moving large quantities
of data into newly created tables, or into tables that already contain
data. The utility can handle most data types, including XML, large
objects (LOBs), and user-defined types (UDTs). The load utility is
faster than the import utility, because it writes formatted pages
directly into the database, while the import utility performs SQL
INSERTs. The load utility does not fire triggers, and does not perform
referential or table constraints checking (other than validating the
uniqueness of the indexes).
The load process consists of four distinct phases (see
Figure 1):
- Load
During the load phase, data is loaded
into the table, and index keys and table statistics are collected,
if necessary. Save points, or points of consistency,
are established at intervals specified through the SAVECOUNT parameter
in the LOAD command. Messages are generated, indicating
how many input rows were successfully loaded at the time of the save
point.
- Build
During the build phase, indexes are
produced based on the index keys collected during the load phase.
The index keys are sorted during the load phase, and index statistics
are collected (if the STATISTICS USE PROFILE option
was specified, and profile indicates collecting index statistics).
The statistics are similar to those collected through the RUNSTATS command.
- Delete
During the delete phase, the rows that
caused a unique or primary key violation are removed from the table.
These deleted rows are stored in the load exception table, if one
was specified.
- Index copy
During the index copy phase, the
index data is copied from a system temporary table space to the original
table space. This will only occur if a system temporary table space
was specified for index creation during a load operation with the READ
ACCESS option specified.
Figure 1. The Four Phases of the Load Process: Load,
Build, Delete, and Index Copy
Note: After you invoke the load utility, you can use the LIST
UTILITIES command to monitor the progress of the load operation.
The following information is required when loading data:
- The path and the name of the input file, named pipe, or device.
- The name or alias of the target table.
- The format of the input source. This format can be DEL, ASC, PC/IXF,
or CURSOR.
- Whether the input data is to be appended to the table, or is to
replace the existing data in the table.
- A message file name, if the utility is invoked through the application
programming interface (API), db2Load.
Load modes
- INSERT
In this mode, load appends input data
to the table without making any changes to the existing data.
- REPLACE
In this mode, load deletes existing
data from the table and populates it with the input data.
- RESTART
In this mode, an interrupted load is
resumed. In most cases, the load is resumed from the phase it failed
in. If that phase was the load phase, the load is resumed from the
last successful consistency point.
- TERMINATE
In this mode, a failed load operation
is rolled back.
The options you can specify include:
- That the data to be loaded resides on the
client, if the load utility is invoked from a remotely connected client.
Note that XML and LOB data are always read from the server, even you
specify the CLIENT option.
- The method to use for loading the data: column location, column
name, or relative column position.
- How often the utility is to establish consistency points.
- The names of the table columns into which the data is to be inserted.
- Whether or not preexisting data in the table can be queried while
the load operation is in progress.
- Whether the load operation should wait for other utilities or
applications to finish using the table or force the other applications
off before proceeding.
- An alternate system temporary table space in which to build the
index.
- The paths and the names of the input files in which LOBs are stored.
Note: The
load utility does not honor the COMPACT lob option.
- A message file name. During load operations, you can specify that
message files be created to contain the error, warning, and informational
messages associated with those operations. Specify the name of these
files with the MESSAGES parameter.
Note: - You can only view the contents of a message file after the operation
is finished. If you wish to view load messages while a load operation
is running, you can use the LOAD QUERY command.
- Each message in a message file begins on a new line and contains
information provided by the DB2® message
retrieval facility.
- Whether column values being loaded have implied decimal points.
- Whether the utility should modify the amount of free space available
after a table is loaded.
- Whether statistics are to be gathered during the load process.
This option is only supported if the load operation is running in REPLACE mode.
Statistics are collected according to the profile defined for the
table. The profile must be created by the RUNSTATS command
before the LOAD command is executed. If the profile
does not exist and the load operation is instructed to collect statistics
according to the profile, the load will fail, and an error message
will be returned.
If data is appended to a table, statistics are
not collected. To collect current statistics on an appended table,
invoke the RUNSTATS utility following completion
of the load process. If gathering statistics on a table with a unique
index, and duplicate keys are deleted during the delete phase, statistics
are not updated to account for the deleted records. If you expect
to have a significant number of duplicate records, do not collect
statistics during the load operation. Instead, invoke the RUNSTATS utility
following completion of the load process.
- Whether
to keep a copy of the changes made. This is done to enable rollforward
recovery of the database. This option is not supported if rollforward
recovery is disabled for the database; that is, if the database configuration
parameters logarchmeth1 and logarchmeth2 are
set to OFF. If no copy is made, and rollforward recovery is enabled,
the table space is left in Backup Pending state at the completion
of the load operation.
Logging is required for fully recoverable
databases. The load utility almost completely eliminates the logging
associated with the loading of data. In place of logging, you have
the option of making a copy of the loaded portion of the table. If
you have a database environment that allows for database recovery
following a failure, you can do one of the following:
- Explicitly request that a copy of the loaded portion of the table
be made.
- Take a backup of the table spaces in which the table resides immediately
after the completion of the load operation.
If the database configuration parameter logindexbuild is
set, and if the load operation is invoked with the COPY YES recoverability
option and the INCREMENTAL indexing option, the load
logs all index modifications. The benefit of using these options is
that when you roll forward through the log records for this load,
you also recover the indexes (whereas normally the indexes are not
recovered unless the load uses the REBUILD indexing
mode).
If you are loading a table that already contains data,
and the database is non-recoverable, ensure that you have a backed-up
copy of the database, or the table spaces for the table being loaded,
before invoking the load utility, so that you can recover from errors.
If
you want to perform a sequence of multiple load operations on a recoverable
database, the sequence of operations will be faster if you specify
that each load operation is non-recoverable, and take a backup at
the end of the load sequence, than if you invoke each of the load
operations with the COPY YES option. You can use
the NONRECOVERABLE option to specify that a load
transaction is to be marked as non-recoverable, and that it will not
be possible to recover it by a subsequent rollforward operation. The
rollforward utility will skip the transaction, and will mark the table
into which data was being loaded as "invalid". The utility will also
ignore any subsequent transactions against that table. After the rollforward
operation is completed, such a table can only be dropped (see Figure 2). With this option, table spaces are
not put in backup pending state following the load operation, and
a copy of the loaded data does not have to be made during the load
operation.
Figure 2. Non-recoverable Processing
During a Roll Forward Operation
- The fully qualified path to be used when creating temporary files
during a load operation. The name is specified by the TEMPFILES
PATH parameter of the LOAD command. The
default value is the database path. The path resides on the server
machine, and is accessed by the DB2 instance
exclusively. Therefore, any path name qualification given to this
parameter must reflect the directory structure of the server, not
the client, and the DB2 instance
owner must have read and write permission on the path.