The RUNSTATS utility collects statistics
on tables, indexes, and statistical views to provide the optimizer
with accurate information for access plan selection.
Use the
RUNSTATS utility to collect statistics
in the following situations:
- After data is loaded into a table and appropriate indexes are
created
- After creating an index on a table
- After a table is reorganized with the REORG utility
- After a table and its indexes are significantly modified through
update, insert, or delete operations
- Before binding application programs whose performance is critical
- When you want to compare current and previous statistics
- When the prefetch value has changed
- After executing the REDISTRIBUTE DATABASE PARTITION GROUP command
- When you have XML columns. When RUNSTATS is
used to collect statistics for XML columns only, existing statistics
for non-XML columns that were collected during a load operation or
a previous RUNSTATS operation are retained. If
statistics on some XML columns were collected previously, those statistics
are either replaced or dropped if the current RUNSTATS operation
does not include those columns.
To improve RUNSTATS performance and save disk
space used to store statistics, consider specifying only those columns
for which data distribution statistics should be collected.
You should rebind application programs after executing RUNSTATS.
The query optimizer might choose different access plans if new statistics
are available.
If a full set of statistics cannot be collected at one time, use
the RUNSTATS utility on subsets of the objects.
If inconsistencies occur as a result of ongoing activity against those
objects, a warning message (SQL0437W, reason code 6) is returned during
query optimization. If this occurs, use RUNSTATS again
to update the distribution statistics.
To ensure that index statistics are synchronized with the corresponding
table, collect both table and index statistics at the same time. If
a table was modified extensively since the last time that statistics
were gathered, updating only the index statistics for that table leaves
the two sets of statistics out of synchronization with each other.
Using the RUNSTATS utility on a production system
might negatively affect workload performance. The utility now supports
a throttling option that can be used to limit the performance impact
of RUNSTATS execution during high levels of database
activity.
When you collect statistics for a table in a partitioned database
environment, RUNSTATS operates only on the database
partition from which the utility is executed. The results from this
database partition are extrapolated to the other database partitions.
If this database partition does not contain a required portion of
the table, the request is sent to the first database partition in
the database partition group that contains the required data.
Statistics for a statistical view are collected on all database
partitions containing base tables that are referenced by the view.
Consider the following tips to improve the efficiency of
RUNSTATS and
the usefulness of the statistics:
- Collect statistics only for columns that are used to join tables
or for columns that are referenced in the WHERE, GROUP BY, or similar
clauses of queries. If the columns are indexed, you can specify these
columns with the ONLY ON KEY COLUMNS clause on
the RUNSTATS command.
- Customize the values of the num_freqvalues and num_quantiles database
configuration parameters for specific tables and columns.
- When you create an index for a populated table, use the COLLECT
STATISTICS clause to create statistics as the index is created.
- When significant numbers of table rows are added or removed, or
if data in columns for which you collect statistics is updated, use RUNSTATS again
to update the statistics.
- Because RUNSTATS collects statistics on only
a single database partition, the statistics are less accurate if the
data is not distributed consistently across all database partitions.
If you suspect that there is skewed data distribution, consider redistributing
the data across database partitions by using the REDISTRIBUTE
DATABASE PARTITION GROUP command before using the RUNSTATS utility.
- For DB2® V9.7
Fix Pack 1 and later releases, distribution statistics can be collected
on an XML column. Distribution statistics are collected for each index
over XML data specified on the XML column. By default, a maximum of
250 quantiles are used for distribution statistics for each index
over XML data.
When collecting distribution statistics on an XML
column, you can change maximum number of quantiles. You can lower
the maximum number of quantiles to reduce the space requirements for
XML distribution statistics based on your particular data size, or
you can increase the maximum number of quantiles if 250 quantiles
are not sufficient to capture the distribution statistics of the data
set for an index over XML data.