DB2 Version 10.1 for Linux, UNIX, and Windows

Index creation during load operations

Indexes are built during the build phase of a load operation. There are four indexing modes that can be specified in the LOAD command.

  1. REBUILD. All indexes are rebuilt.
  2. INCREMENTAL. Indexes are extended with new data.
  3. AUTOSELECT. The load utility automatically decides between REBUILD or INCREMENTAL mode. AUTOSELECT is the default. If a LOAD REPLACE operation is taking place, the REBUILD indexing mode is used. Otherwise, the indexing mode chosen is based on the ratio of the amount of existing data in the table to the amount of newly loaded data. If the ratio is sufficiently large, the INCREMENTAL indexing mode is chosen. Otherwise, the REBUILD indexing mode is chosen.
  4. DEFERRED. The load utility does not attempt index creation if this mode is specified. Indexes are marked as needing a refresh, and a rebuild might be forced the first time they are accessed. The DEFERRED option is not allowed in any of the following situations:
    • If the ALLOW READ ACCESS option is specified (it does not maintain the indexes and index scanners require a valid index)
    • If any unique indexes are defined against the table
    • If XML data is being loaded (the XML Paths index is unique and is created by default whenever an XML column is added to a table)

Load operations that specify the ALLOW READ ACCESS option require special consideration in terms of space usage and logging depending on the type of indexing mode chosen. When the ALLOW READ ACCESS option is specified, the load utility keeps indexes available for queries even while they are being rebuilt.

When a load operation in ALLOW READ ACCESS mode specifies the INDEXING MODE INCREMENTAL option, the load utility writes some log records that protect the integrity of the index tree. The number of log records written is a fraction of the number of inserted keys and is a number considerably less than would be needed by a similar SQL insert operation. A load operation in ALLOW NO ACCESS mode with the INDEXING MODE INCREMENTAL option specified writes only a small log record beyond the normal space allocation logs.
Note: This is only true if you did not specify COPY YES and have the logindexbuild configuration parameter set to ON.

When a load operation in ALLOW READ ACCESS mode specifies the INDEXING MODE REBUILD option, new indexes are built as a shadow either in the same table space as the original index or in a system temporary table space. The original indexes remain intact and are available during the load operation and are only replaced by the new indexes at the end of the load operation while the table is exclusively locked. If the load operation fails and the transaction is rolled back, the original indexes remain intact.

By default, the shadow index is built in the same table space as the original index. Since both the original index and the new index are maintained simultaneously, there must be sufficient table space to hold both indexes at the same time. If the load operation is aborted, the extra space used to build the new index is released. If the load operation commits, the space used for the original index is released and the new index becomes the current index. When the new indexes are built in the same table space as the original indexes, replacing the original indexes takes place almost instantaneously.

If the indexes are built within an SMS table space, you can see index files in the table space directory with the .IN1 suffix and the .INX suffix. These suffixes do not indicate which is the original index and which is the shadow index. However, if the indexes are built in a DMS table space, you cannot see the new shadow index.

Improving index creation performance

Building new indexes in a system temporary table space

The new index can be built in a system temporary table space to avoid running out of space in the original table space. The USE tablespace-name option allows the indexes to be rebuilt in a system temporary table space when using INDEXING MODE REBUILD and ALLOW READ ACCESS options. The system temporary table can be an SMS or a DMS table space, but the page size of the system temporary table space must match the page size of the original index table space.

The USE tablespace-name option is ignored if the load operation is not in ALLOW READ ACCESS mode, or if the indexing mode is incompatible. The USE tablespace-name option is only supported for the INDEXING MODE REBUILD or INDEXING MODE AUTOSELECT options. If the INDEXING MODE AUTOSELECT option is specified and the load utility selects incremental maintenance of the indexes, the USE tablespace-name is ignored.

A load restart operation can use an alternate table space for building an index, even if the original load operation did not use an alternate table space. A load restart operation cannot be issued in ALLOW READ ACCESS mode if the original load operation was not issued in ALLOW READ ACCESS mode. Load terminate operations do not rebuild indexes, so the USE tablespace-name is ignored.

During the build phase of the load operation, the indexes are built in the system temporary table space. Then, during the index copy phase, the index is copied from the system temporary table space to the original index table space. To make sure that there is sufficient space in the original index table space for the new index, space is allocated in the original table space during the build phase. So, if the load operation runs out of index space, it will do so during the build phase. If this happens, the original index is not lost.

The index copy phase occurs after the build and delete phases. Before the index copy phase begins, the table is locked exclusively. That is, it is unavailable for read access throughout the index copy phase. Since the index copy phase is a physical copy, the table might be unavailable for a significant amount of time.
Note: If either the system temporary table space or the index table space are DMS table spaces, the read from the system temporary table space can cause random I/O on the system temporary table space and can cause a delay. The write to the index table space is still optimized and the DISK_PARALLELISM values are used.
Considerations for large indexes

In order to improve performance when building large indexes during a load, it can be useful to tune the sortheap database configuration parameter. sortheap allocates the amount of memory dedicated to the sorting of index keys during a load operation. For example, to direct the load utility to use 4000 pages of main memory per index for key sorting, set sortheap to 4000 pages, disconnect all applications from the database, and then issue the LOAD command.

If an index is so large that it cannot be sorted in memory, a sort spill, or an overflow, occurs. That is, the data is divided among several "sort runs" and stored in a temporary table space that is merged later. Use the sort_overflows monitor element to determine whether a sort spill occurred. If there is no way to avoid a sort spill by increasing the size of the sortheap parameter, ensure that the buffer pool for temporary table spaces be large enough to minimize the amount of disk I/O that spilling causes. Furthermore, to achieve I/O parallelism during the merging of sort runs, it is recommended that temporary table spaces be declared with multiple containers, each residing on a different disk device. If there is more than one index defined on a table, memory consumption increases proportionally because the load operation keeps all keys in memory.

Deferring index creation
Generally speaking, it is more efficient to allow indexes to be created during the load operation by specifying either REBUILD or INCREMENTAL mode than it is to have index creation deferred. As Figure 1 indicates, tables are normally built in three steps: data loading, index building, and statistics collection. This causes multiple data I/O during the load operation, index creation (there can be several indexes for each table), and statistics collection (which causes I/O on the table data and on all of the indexes). A much faster alternative is to let the load utility complete all of these tasks in one pass through the data. It should be noted, however, that unique indexes reduce load performance if duplicates are encountered.
Figure 1. Increasing load performance through concurrent indexing and statistics collection. Tables are normally built in three steps: data loading, index building, and statistics collection. This causes multiple data I/O during the load operation, during index creation (there can be several indexes for each table), and during statistics collection (which causes I/O on the table data and on all of the indexes). A much faster alternative is to let the load utility complete all of these tasks in one pass through the data.
It is faster to let the load utility load the data, build the indexes and collect statistics in one pass of the data.

At certain times, deferring index creation and invoking the CREATE INDEX statement can improve performance. Sorting during index rebuild uses up to sortheap pages. If more space is required, TEMP buffer pool is used and (eventually) spilled to disk. If load spills, and thus decreases performance, it might be advisable to run LOAD with INDEXING MODE DEFERRED and re-create the index later. CREATE INDEX creates one index at a time, reducing memory usage while scanning the table multiple times to collect keys.

Another advantage of building indexes with a CREATE INDEX statement instead of concurrently with the load operation is that the CREATE INDEX statement can use multiple processes, or threads, to sort keys. The actual building of the index is not executed in parallel.