Start of change

Reducing the cost of collecting statistics

You can reduce the cost of maintaining statistics for your database objects by taking certain actions.

Procedure

To reduce the cost of collecting statistics for data objects, use any of the following approaches:

  • Start of changeFor tables that contain many rows of data, use the TABLESAMPLE AUTO option, in most cases. The TABLESAMPLE option reduces the number of rows and pages scanned. The TABLESAMPLE option can be hundreds of times faster because the number of pages and rows scanned can be 10,000 times fewer, and the number of rows sampled can be 100 times fewer. Under the default of TABLESAMPLE AUTO, DB2® determines the sampling rate based on the number of rows in a table. If real-time statistics indicate that a table contains fewer than 500,000 rows, sampling is not used to ensure the accuracy of the result.

    The TABLESAMPLE option is preferred over the SAMPLE option because the SAMPLE option does not reduce the number of rows scanned, and because the sampling rate can cause inaccurate results for tables with fewer rows.

    End of change
  • Consider running several RUNSTATS jobs concurrently against different partitions of a partitioned table space or index rather than running a single RUNSTATS job on the entire table space or index. The sum of the processor time for the concurrent jobs is roughly equivalent to the processor time for running the single RUNSTATS job. However, the total elapsed time for the concurrent jobs can be significantly less than when you run RUNSTATS on an entire table space or index.
  • When you collect statistics at the table level, collect statistics only for columns or column groups that might be used as search conditions in WHERE clauses.
    • Specify the COLUMN option to collect statistics for specified columns only.
    • Specify the COLGROUP option to collect statistics for specified groups of columns only. You can also specify the FREQVAL option with the COLGROUP option to collect distribution statistics for the column group. You cannot specify the COLGROUP option when you collect inline statistics by using utilities other than RUNSTATS.
    When you collect statistics on groups of columns that are used in predicates, the improved accuracy of the filter factor estimate can lead to improved query performance. However, collecting statistics on all columns of a table is costly and might be unnecessary.
  • Collect frequency and cardinality statistics only when the values change. For example, a column on GENDER is likely to have a COLCARDF of 2, with M and F as the possible values. It is unlikely that the cardinality for this column ever changes. The distribution of the values in the column might not change often, depending on the volatility of the data.
  • Avoid running the RUNSTATS utility by specifying the STATISTICS keyword to collect inline statistics when you run any of the following utilities:
    • LOAD
    • REBUILD INDEX
    • REORG INDEX
    • REORG TABLESPACE
    When you specify STATISTICS in one of these utility statements, DB2 updates the catalog with table space or index space statistics for the objects on which the utility is run. However, you cannot collect column group statistics with the STATISTICS keyword. You can collect column group statistics only by running the RUNSTATS utility. If you restart a LOAD or REBUILD INDEX job that uses the STATISTICS keyword, DB2 does not collect inline statistics. For these cases, you must run the RUNSTATS utility after the restarted utility job completes.
End of change