You can use the LOAD utility to load one or more partitions
of a partitioned table space. To improve performance when loading
more than one partition, consider enabling partition parallelism.
About this task
Partition parallelism can reduce the elapsed time that
is required for loading large amounts of data into partitioned table
spaces.
If you are loading a partitioned table space that is
created with DEFINE NO, the load operation might take longer. If a
partitioned table space is created with DEFINE NO, all partitions
are also implicitly defined with DEFINE NO. The first data row that
is inserted by the LOAD utility defines all data sets in the partitioned
table space. If this process takes a long time, expect timeouts on
the database descriptor (DBD).
Restriction: You
cannot load data at the partition level of a partition-by-growth table
space.
Procedure
To load partitions:
- If you want to load only certain partitions of a partitioned
table, use the PART clause of the INTO TABLE option. If you omit the
PART clause, the entire table is loaded.
Restriction: The following restrictions exist for identity columns:
- When index-based partitioning is used, LOAD INTO TABLE PART
integer is not allowed if an identity column is part
of the partitioning index.
- When table-based partitioning is used, LOAD INTO TABLE PART
integer is not allowed if an identity column is used
in a partitioning clause of the CREATE TABLE or ALTER TABLE statement.
To override these restrictions, specify the
IDENTITYOVERRIDE option in the LOAD statement.
- If you want partitions to be processed in parallel, take
one of the following actions:
- If one or more
nonpartitioned secondary indexes exists on the partitioned table space, and you have a separate
input data set for each partition, use load partition parallelism. Partition parallelism loads all
partitions in a single job. To invoke partition parallelism, for each partition that you want to
load, specify the INTO TABLE PART clause with one of the following keywords:
- INDDN
- INCURSOR
- DISCARDDN if DISCARDS n is specified
If the table space is created with DEFINE NO, specifying SHRLEVEL CHANGE on your LOAD
statement and enabling partition parallelism is equivalent to concurrent, independent insert jobs.
For example, in a large partitioned table space that is created with DEFINE NO, the LOAD utility
starts three tasks. The first task tries to insert the first row, which causes an update to the DBD.
The other two tasks time out while they wait to access the DBD. The first task holds the lock on the
DBD while the data sets are defined for the table space.
- If the only indexes are the partitioned indexes, use multiple
jobs to run LOAD concurrently against separate partitions. This method
also requires that you have a separate input data set for each partition.
- If you use the INTO TABLE PART clause, take the following
actions as appropriate:
- If you specify the REPLACE or RESUME options, specify them separately
by partition. If you specify these options before the INTO TABLE PART
clause, LOAD serializes the load operation for the entire table space
and does not process the partitions concurrently.
- To load columns in an order that is different than the order of
the columns in the CREATE TABLE statement, code field specifications
for each INTO TABLE PART clause.
- Make sure that
you specify the LOAD options in the appropriate place in the utility statement. Some
LOAD options, such as INDDN, DISCARDDN, RESUME, and REPLACE, can have a different
scope depending on their placement in the LOAD statement. For example, if you specify
INDDN before INTO TABLE, the specified input data set is used to load the entire
table. However, if you specify INDDN after INTO TABLE, in a PART clause, the specified
input data set is used to load only the specified partition.
- If you want DB2® to save copies of records that are not loaded, use the DISCARDDN option to specify a discard data set as follows:
- If you want a single discard data set for all partitions, specify DISCARDDN before
INTO TABLE. If you use a template for this data set, and the TEMPLATE utility
statement contains the variable $PA. or $PART., that variable is substituted with 0 or
the partition number in the first INTO TABLE PART clause.
- If you want one discard data set for a partition, specify DISCARDDN in an INTO TABLE
PART clause. If you use a template for this data set, and the TEMPLATE utility
statement contains the variable $PA. or $PART., that variable is substituted with the
partition number. If you specify DISCARDDN in more than one INTO TABLE PART clause,
you must ensure that the data set names are unique. One way to ensure unique names is
to use templates and the variable $PA. or $PART. in the TEMPLATE statement.
Examples
- Example of loading certain records into certain partitions
- The control statement in the following example specifies that DB2 is to load data into the first
and second partitions of the employee table. Records with '0' in column
1 replace the contents of partition 1; records with '1' in column
1 are added to partition 2; all other records are ignored. This example
control statement, which is simplified to illustrate the point, does
not list field specifications for all columns of the table.
LOAD DATA CONTINUEIF(72:72)='X'
INTO TABLE DSN8A10.EMP PART 1 REPLACE WHEN (1) = '0'
( EMPNO POSITION (1:6) CHAR(6),
FIRSTNME POSITION (7:18) CHAR(12),
⋮
)
INTO TABLE DSN8A10.EMP PART 2 RESUME YES WHEN (1) = '1'
( EMPNO POSITION (1:6) CHAR(6),
FIRSTNME POSITION (7:18) CHAR(12),
⋮
)
- Example of loading partitions from separate input data sets
- The
following example LOAD statements specify that partitions 1 and 2
of the EMP table are to be loaded from the EMPLDS1 and EMPLDS2 data
sets. This example assumes that your data is in separate input data
sets and already sorted by partition. Therefore, you do not need to
use the WHEN clause of INTO TABLE. Placing the RESUME YES option before
the PART option inhibits concurrent partition processing. If you
want LOAD to process other partitions concurrently, specify the RESUME
option after the PART option.
LOAD DATA INDDN EMPLDS1 CONTINUEIF(72:72)='X'
RESUME YES
INTO TABLE DSN8A10.EMP REPLACE PART 1
LOAD DATA INDDN EMPLDS2 CONTINUEIF(72:72)='X'
RESUME YES
INTO TABLE DSN8A10.EMP REPLACE PART 2
- Example of loading partitions independently
- In
the following example, partition 1 and partition 2 are loaded concurrently.
LOAD DATA INDDN SYSREC LOG NO
INTO TABLE DSN8A10.EMP PART 1 REPLACE
LOAD DATA INDDN SYSREC2 LOG NO
INTO TABLE DSN8A10.EMP PART 2 REPLACE
- Example of specifying separate discard data sets for each partition
- If you specify multiple discard data sets, you must ensure that the data set names are
unique. In the following example, the LOAD statement specifies that two discard data
sets are to be allocated, one for each partition. Those discard data sets both use the
DT template. The TEMPLATE statement for DT includes the &PA. variable (partition
number) to ensure that the data set name is different for each partition.
TEMPLATE DT UNIT(SYSDA)
DSN(JUOSU339.&TS..P&PA..DISCARD)
SPACE(50,10) TRK
LOAD DATA
INTO TABLE DSN8B10.EMP PART 1 INDDN D1 DISCARDDN(DT) REPLACE NO RESUME
INTO TABLE DSN8B10.EMP PART 2 INDDN D2 DISCARDDN(DT) REPLACE NO RESUME