DB2 10.5 for Linux, UNIX, and Windows

Determining when to reorganize tables and indexes

After many changes to table data, logically sequential data might be on nonsequential physical data pages, especially if many update operations created overflow records. When the data is organized in this way, the database manager must perform additional read operations to access required data. Additional read operations are also required if many rows are deleted.

About this task

Table reorganization defragments data, eliminating wasted space. It also reorders the rows to incorporate overflow records, improving data access and, ultimately, query performance. You can specify that the data can be reordered according to a particular index, so that queries can access the data with a minimal number of read operations.

Many changes to table data can cause index performance to degrade. Index leaf pages can become fragmented or badly clustered, and the index could develop more levels than necessary for optimal performance. All of these issues cause more I/Os and can degrade performance.

Any one of the following factors indicate that you might reorganize a table or index:

A high volume of insert, update, and delete activity against a table since the table was last reorganized
Significant changes in the performance of queries that use an index with a high cluster ratio
Executing the RUNSTATS command to refresh statistical information does not improve performance
Output from the REORGCHK command suggests that performance can be improved by reorganizing a table or its indexes
In some cases, the reorgchk utility might recommend table reorganization, even after a table reorg operation is performed. You should analyze reorgchk utility recommendations and assess the potential benefits against the costs of performing a reorganization.
If reclaiming space is your primary concern, the REORG command with the CLEANUP and RECLAIM EXTENTS options can be used.
The RECLAIMABLE_SPACE output of the ADMIN_GET_INDEX_INFO and ADMIN_GET_TAB_INFO functions show how much space, in kilobytes, is available for reclaim. If you issue the REORG command with the CLEANUP option before running the ADMIN_GET_INDEX_INFO and ADMIN_GET_TAB_INFO functions, the output of the functions shows the maximum space available for reclamation. Use this information to determine when a REORG with RECLAIM EXTENTS would help reduce the size of your tables and indexes.

The REORGCHK command returns statistical information about data organization and can advise you about whether particular tables or indexes need to be reorganized. When space reclaim is your only concern, the RECLAIMABLE_SPACE output of the ADMIN_GET_INDEX_INFO and ADMIN_GET_TAB_INFO functions outline how much space is available for reclaim. However, running specific queries against the SYSSTAT views at regular intervals or at specific times can build a history that helps you identify trends that have potentially significant performance implications.

To determine whether there is a need to reorganize your tables or indexes, query the SYSSTAT views and monitor the following statistics:

Overflow of rows

The overflow value represents the number of rows that do not fit on their original pages. To monitor the overflow value, query the OVERFLOW column in the SYSSTAT.TABLES view. Row data can overflow when variable length columns cause the record length to expand to the point that a row no longer fits into its assigned location on the data page. Length changes can also occur if you add a column to the table. In this case, a pointer is kept in the original location in the row, and the value is stored in another location that is indicated by the pointer. This can impact performance because the database manager must follow the pointer to find the contents of the column. This two-step process increases the processing time and might also increase the number of I/Os that are required. Reorganizing the table data eliminates any row overflows.

An overflow can also occur if a row is compressed, then updated. This overflow occurs when the updated row can no longer be compressed or if the updated compressed row is longer.

Fetch statistics

To determine the effectiveness of the prefetchers when the table is accessed in index order, query the following columns in the SYSSTAT.INDEXES catalog view. The statistics in these columns characterize the average performance of the prefetchers against the underlying table.

The AVERAGE_SEQUENCE_FETCH_PAGES column stores the average number of pages that can be accessed in sequence. Pages that can be accessed in sequence are eligible for prefetching. A small number indicates that the prefetchers are not as effective as they could be, because they cannot read in the full number of pages that is specified by the PREFETCHSIZE value for the table space. A large number indicates that the prefetchers are performing effectively. For a clustered index and table, the number should approach the value of NPAGES, the number of pages that contain rows.
The AVERAGE_RANDOM_FETCH_PAGES column stores the average number of random table pages that are fetched between sequential page accesses when table rows are fetched by using an index. The prefetchers ignore small numbers of random pages when most pages are in sequence and continue to prefetch to the configured prefetch size. As the table becomes more disorganized, the number of random fetch pages increases. Disorganization is usually caused by insertions that occur out of sequence, either at the end of the table or in overflow pages, and query performance is impacted when an index is used to access a range of values.
The AVERAGE_SEQUENCE_FETCH_GAP column stores the average gap between table page sequences when table rows are fetched by using an index. Detected through a scan of index leaf pages, each gap represents the average number of table pages that must be randomly fetched between sequences of table pages. This occurs when many pages are accessed randomly, which interrupts the prefetchers. A large number indicates that the table is disorganized or poorly clustered to the index.

Number of index leaf pages that contain record identifiers (RIDs) that are marked deleted but not yet removed

RIDs are not usually physically deleted when they are marked deleted. This means that useful space might be occupied by these logically deleted RIDs. To retrieve the number of leaf pages on which every RID is marked deleted, query the NUM_EMPTY_LEAFS column of the SYSSTAT.INDEXES view. For leaf pages on which not all RIDs are marked deleted, the total number of logically deleted RIDs is stored in the NUMRIDS_DELETED column.

Use this information to estimate how much space you might reclaim by issuing the REORG INDEXES command with the CLEANUP ALL option. To reclaim only the space in pages on which all RIDs are marked deleted, issue the REORG INDEXES command with the CLEANUP PAGES option.

Cluster-ratio and cluster-factor statistics for indexes

In general, only one of the indexes for a table can have a high degree of clustering. A cluster-ratio statistic is stored in the CLUSTERRATIO column of the SYSCAT.INDEXES catalog view. This value, which is 0 - 100, represents the degree of data clustering in the index. If you collect detailed index statistics, a finer cluster-factor statistic of 0 - 1 is stored in the CLUSTERFACTOR column instead, and the value of the CLUSTERRATIO column is -1. Only one of these two clustering statistics can be recorded in the SYSCAT.INDEXES catalog view. To compare CLUSTERFACTOR values with CLUSTERRATIO values, multiply the CLUSTERFACTOR value by 100 to obtain a percentage value.

Index scans that are not index-only access might perform better with a higher density of indexes. A low density leads to more I/O for this type of scan because a data page is less likely to remain in the buffer pool until it is accessed again. Increasing the buffer size might improve the performance of low density indexes. If smart index prefetching is enabled, it can also improve the performance of low density indexes, which reduces the need to issue the REORG command on indexes. Smart index prefetching achieves this by switching from sequential detection prefetching to read-ahead prefetching whenever low density indexes exist.

If table data was initially clustered on a certain index and the clustering statistics indicate that the data is now poorly clustered on that index, you might want to reorganize the table to recluster the data. Also, if smart data prefetching is enabled, it can improve the performance of poorly clustered data, which reduces the need to issue the REORG command on tables. Smart data prefetching achieves this by switching from sequential detection prefetching to read-ahead prefetching whenever badly clustered data pages exist.

Number of leaf pages

To determine the number of leaf pages that are occupied by an index, query the NLEAF column in the SYSCAT.INDEXES view. This number tells you how many index page I/Os are needed for a complete scan of the index.

Ideally, an index should occupy as little space as possible to reduce the number of I/Os that are required for an index scan. Random update activity can cause page splits that increase the size of an index. During a table REORG operation, each index can be rebuilt with the least amount of space.

By default, 10% of free space is left on each index page when an index is built. To increase the free space amount, specify the PCTFREE option when you create the index. The specified PCTFREE value is used whenever you reorganize the index. A free space value that is greater than 10% might reduce the frequency of index reorganization, because the extra space can accommodate additional index insertions.

Number of empty data pages

To calculate the number of empty pages in a table, query the FPAGES and NPAGES columns in the SYSCAT.TABLES view and then subtract the NPAGES value (the number of pages that contain rows) from the FPAGES value (the total number of pages that are in use). Empty pages can occur when you delete entire ranges of rows.

As the number of empty pages increases, so does the need for table reorganization. Reorganizing a table reclaims empty pages and reduces the amount of space that a table uses. In addition, because empty pages are read into the buffer pool during a table scan, reclaiming unused pages can improve scan performance.

Table reorganization is not recommendedif the following equation evaluates to true: the total number of in-use pages (specified in the FPAGES column) in a table <= (NPARTITIONS x 1 extent size). NPARTITIONS represents the number of data partitions if the table is a partitioned table; otherwise, its value is 1.

In a partitioned database environment, table reorganization is not recommended if the following equation evaluates to true: value of the FPAGES column <= (the number of database partitions in a database partition group of the table) x (NPARTITIONS x 1 extent size).

Before reorganizing tables or indexes, consider the trade-off between the cost of increasingly degraded query performance and the cost of table or index reorganization, which includes processing time, elapsed time, and reduced concurrency.