Reducing the cost of collecting statistics
You can reduce the cost of maintaining statistics for your database objects by taking certain actions.
Procedure
To reduce the cost of collecting statistics for data objects, use any of the following approaches:
- For
tables that contain many rows of data, use the TABLESAMPLE AUTO option, in most cases.
The TABLESAMPLE option reduces the number of rows and pages scanned. The TABLESAMPLE option can be hundreds of times faster because the number of pages and rows scanned can be 10,000 times fewer, and the number of rows sampled can be 100 times fewer. Under the default of TABLESAMPLE AUTO, DB2® determines the sampling rate based on the number of rows in a table. If real-time statistics indicate that a table contains fewer than 500,000 rows, sampling is not used to ensure the accuracy of the result.
The TABLESAMPLE option is preferred over the SAMPLE option because the SAMPLE option does not reduce the number of rows scanned, and because the sampling rate can cause inaccurate results for tables with fewer rows.
- Consider running several RUNSTATS jobs concurrently against different partitions of a partitioned table space or index rather than running a single RUNSTATS job on the entire table space or index. The sum of the processor time for the concurrent jobs is roughly equivalent to the processor time for running the single RUNSTATS job. However, the total elapsed time for the concurrent jobs can be significantly less than when you run RUNSTATS on an entire table space or index.
- When you collect statistics at the table level, collect statistics only for columns or column
groups that might be used as search conditions in WHERE clauses.
- Specify the COLUMN option to collect statistics for specified columns only.
- Specify the COLGROUP option to collect statistics for specified groups of columns only. You can also specify the FREQVAL option with the COLGROUP option to collect distribution statistics for the column group. You cannot specify the COLGROUP option when you collect inline statistics by using utilities other than RUNSTATS.
- Collect frequency and cardinality statistics only when the values change. For example, a column on GENDER is likely to have a COLCARDF of 2, with M and F as the possible values. It is unlikely that the cardinality for this column ever changes. The distribution of the values in the column might not change often, depending on the volatility of the data.
- Avoid running the RUNSTATS utility by specifying the STATISTICS keyword to collect inline
statistics when you run any of the following utilities:
- LOAD
- REBUILD INDEX
- REORG INDEX
- REORG TABLESPACE