There are several approaches available to improve RUNSTATS performance.
To minimize the performance impact of this utility:
- Limit the columns for which statistics are collected by using
the COLUMNS clause. Many columns are never referenced by predicates
in the query workload, so they do not require statistics.
- Limit the columns for which distribution statistics are collected
if the data tends to be uniformly distributed. Collecting distribution
statistics requires more CPU and memory than collecting basic column
statistics. However, determining whether the values for a column are
uniformly distributed requires either having existing statistics or
querying the data. This approach also assumes that the data remains
uniformly distributed as the table is modified.
- Limit
the number of pages and rows processed by using page- or row-level
table sampling (by specifying the TABLESAMPLE SYSTEM or TABLESAMPLE
BERNOULLI clause) and by using page- or row-level index
sampling (by specifying INDEXSAMPLE SYSTEM or INDEXSAMPLE
BERNOULLI clause). Start with a 10% page-level sample,
by specifying TABLESAMPLE SYSTEM(10) and INDEXSAMPLE
SYSTEM(10). Check the accuracy of the statistics and whether
system performance has degraded due to changes in access plan. If
it has degraded, try a 10% row-level sample instead, by specifying TABLESAMPLE
BERNOULLI(10). Likewise, experiment with the INDEXSAMPLE parameter
to get the right rate for index sampling. If the accuracy of the statistics
is insufficient, increase the sampling amount. When using RUNSTATS page-
or row-level sampling, use the same sampling rate for tables that
are joined. This is important to ensure that the join column statistics
have the same level of accuracy.
- Collect index statistics during index creation by specifying the
COLLECT STATISTICS option on the CREATE INDEX statement. This approach
is faster than performing a separate RUNSTATS operation
after the index is created. It also ensures that the new index has
statistics generated immediately after creation, to allow the optimizer
to accurately estimate the cost of using the index.
- Collect statistics when executing the LOAD command
with the REPLACE option. This approach is faster
than performing a separate RUNSTATS operation after
the load operation completes. It also ensures that the table has the
most current statistics immediately after the data is loaded, to allow
the optimizer to accurately estimate the cost of using the table.
In a partitioned database environment, the
RUNSTATS utility
collects statistics from a single database partition. If the
RUNSTATS command
is issued on a database partition on which the table resides, statistics
are collected there. If not, statistics are collected on the first
database partition in the database partition group for the table.
For consistent statistics, ensure that statistics for joined tables
are collected from the same database partition.