Updates
statistics in the system catalog about the characteristics of a table,
associated indexes, or statistical views. These characteristics include
number of records, number of pages, and average record length. The
optimizer uses these statistics when determining access paths to the
data.
For
a table, call the RUNSTATS command when the table
has had many updates, or after reorganizing the table. For a statistical
view, call the RUNSTATS command when changes to
underlying tables have substantially affected the rows returned by
the view. The view must have been previously enabled for use in query
optimization by using the ALTER VIEW statement.
Scope
You
can issue the RUNSTATS command from any database
partition in the db2nodes.cfg file. You can use
the command to update the catalogs on the catalog database partition.
For
tables, this command collects statistics for a table on the database
partition from which it is invoked. If the table does not exist on
that database partition, the first database partition in the database
partition group is selected.
For views, this command collects
statistics using data from tables on all participating database partitions.
Authorization
For
a table, you require one of the following authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the table
- LOAD authority
You do not need any explicit privilege to use this command
on any declared temporary table that exists within its connection.
For
statistical views, one of the following authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the statistical view
- LOAD
authority
Required connection
Database
Command syntax

>>-RUNSTATS--ON TABLE--object-name------------------------------>
>--+-+------------------------+--+------------------------------------+-+-><
| '-| Statistics Options |-' '-UTIL_IMPACT_PRIORITY--+----------+-' |
| '-priority-' |
+-USE PROFILE--------------------------------------------------------+
'-UNSET PROFILE------------------------------------------------------'
Statistics Options
.-ALLOW WRITE ACCESS-.
|--+--------------------------+--+--------------------+--------->
'-| Table Object Options |-' '-ALLOW READ ACCESS--'
>--+----------------------------+------------------------------->
'-| Table Sampling Options |-'
>--+----------------------------+--+---------------------+------|
'-| Index Sampling Options |-' '-| Profile Options |-'
Table Object Options
|--+-FOR--| Index Clause |--+-----------------------+----------------------------------+--|
| '-EXCLUDING XML COLUMNS-' |
'-+-------------------------+--+-----------------------+--+-----------------------+-'
'-| Column Stats Clause |-' '-EXCLUDING XML COLUMNS-' '-AND--| Index Clause |-'
Table Sampling Options
|--TABLESAMPLE--+-BERNOULLI-+--(--numeric-literal--)------------>
'-SYSTEM----'
>--+-----------------------------------+------------------------|
'-REPEATABLE--(--integer-literal--)-'
Index Sampling Options
|--INDEXSAMPLE--+-BERNOULLI-+--(--numeric-literal--)------------|
'-SYSTEM----'
Profile Options
.-SET PROFILE NONE--------------.
|--+-------------------------------+----------------------------|
'-+-SET----+--PROFILE--+------+-'
'-UPDATE-' '-ONLY-'
Index Clause
|--+-------------------------+--+-INDEXES-+--------------------->
| .-SAMPLED---. | '-INDEX---'
'-+-----------+--DETAILED-'
'-UNSAMPLED-'
.-,----------.
V |
>--+---index-name-+-+-------------------------------------------|
'-ALL------------'
Column Stats Clause
|--+-| On Cols Clause |------------------------------+----------|
'-+--------------------+--| Distribution Clause |-'
'-| On Cols Clause |-'
On Cols Clause
.-ON ALL COLUMNS-----------------------------------------------------.
| .-,-----------------. |
| V | |
|--+-ON--+----------------------+--COLUMNS--(----| Column Option |-+--)-+--|
| '-+-ALL-+--COLUMNS AND-' |
| '-KEY-' |
'-ON KEY COLUMNS-----------------------------------------------------'
Distribution Clause
|--WITH DISTRIBUTION--+-------------------------+--------------->
'-| On Dist Cols Clause |-'
>--+--------------------------+---------------------------------|
'-| Default Dist Options |-'
On Dist Cols Clause
.-ON ALL COLUMNS-----------------------------------------------------------------------------------.
| .-,-----------------------------------------------. |
| | .--------------------------. | |
| V V | | |
|--+-ON--+----------------------+--COLUMNS--(----| Column Option |----+----------------------+-+-+--)-+--|
| '-+-ALL-+--COLUMNS AND-' +-| Frequency Option |-+ |
| '-KEY-' '-| Quantile Option |--' |
'-ON KEY COLUMNS-----------------------------------------------------------------------------------'
Default Dist Option
.--------------------------.
V |
|--DEFAULT----+-| Frequency Option |-+-+------------------------|
'-| Quantile Option |--'
Frequency Option
|--NUM_FREQVALUES--integer--------------------------------------|
Quantile Option
|--NUM_QUANTILES--integer---------------------------------------|
Column Option
|--+-column-name--+-----------------+-+-------------------------|
| '-LIKE STATISTICS-' |
| .-,-----------. |
| V | |
'-(----column-name-+--)------------'
Command parameters
- object-name
- Identifies
the table or statistical view on which statistics are to be collected.
This parameter must not be a table hierarchy. For typed tables, the
value of the object-name parameter must be the
name of the root table of the table hierarchy. The fully qualified
name or alias in the form: schema.object-name must
be used. The schema is the user name under which the table was created.
- USE PROFILE
- This option allows RUNSTATS to employ a previously
stored statistics profile to gather statistics for a table or statistical
view. The statistics profile is created using the SET PROFILE options
and is updated using the UPDATE PROFILE options.
- UNSET PROFILE
- Removes
a statistics profile. For example, the following command removes the
profile for the tablemyschema.mytable table:
RUNSTATS ON tablemyschema.mytable UNSET PROFILE
- FOR INDEXES
- Collects and
updates statistics for the indexes only. If no table statistics had been previously collected on the
table, basic table statistics are also collected. Updates statistics for table cardinality (CARD),
FPAGES, NPAGES, even when table statistics exists. These basic statistics do not include any
distribution statistics. This option cannot be used for views. COLCARD of the leading column of the
index might also be updated.
- SAMPLED
- Used
together only with the DETAILED parameter. Specifying
this option does not change the default functionality from DETAILED.
The SAMPLED parameter is left in for compatibility
with previous versions of DB2®.
This parameter cannot be used for views.
- UNSAMPLED
- This
option, when used with the DETAILED option, forces RUNSTATS to
examine every entry in the index to compute the extended index statistics.
You cannot use the UNSAMPLED parameter for views
and it cannot be used together with scan index sampling (INDEXSAMPLE keyword).
This option significantly increases RUNSTATS resource
consumption, while rarely providing significant improvement over the DETAILED or SAMPLED
DETAILED options, which are equivalent.
- DETAILED
- Calculates extended index statistics. The extended index statistics
are the CLUSTERFACTOR and PAGE_FETCH_PAIRS statistics
that are gathered for relatively large indexes. Not all index entries
are examined, a CPU sampling technique is employed instead to improve
performance. You cannot use this parameter for views.
- index-name
- Identifies
an existing index defined on the table. If you do not specify the
fully qualified name in the form: schema.index-name,
the default schema is assumed.
- EXCLUDING XML COLUMNS
- Omits
all XML-type columns from statistics collection. Using this clause
facilitates the collection of statistics on non-XML columns because
the inclusion of XML data can require greater system resources. The EXCLUDING
XML COLUMNS clause takes precedence over other clauses
that specify XML columns for statistics collection. For example, if
you use the EXCLUDING XML COLUMNS clause, and
you also specify XML type columns with the ON COLUMNS clause
or you use the ON ALL COLUMNS clause, all XML-type
columns are ignored during statistics collection. For DB2 V9.7 Fix Pack 1 and later releases, distribution
statistics over XML-type columns are not collected when you specify
this parameter.
- AND INDEXES
- Collects
and updates statistics for both the table and the indexes.
- ON ALL COLUMNS
- Collects
statistics on all eligible columns. You can use this parameter with
the On Cols clause or use this parameter with the On
Dist Cols clause after the WITH DISTRIBUTION parameter.
The ON ALL COLUMNS parameter is the default if
you do not specify either of the column-specific clauses.
If it
is specified in the On Cols clause, all columns will
have only basic column statistics collected unless specific columns
are chosen as part of the WITH DISTRIBUTION clause.
Those columns specified as part of the WITH DISTRIBUTION clause
will also have basic and distribution statistics collected.
If
the WITH DISTRIBUTION ON ALL COLUMNS is specified
both basic statistics and distribution statistics are collected for
all eligible columns. Anything specified in the On Cols clause
is redundant and therefore not necessary.
- ON COLUMNS
- To
collect statistics on specific columns, column groups, or both, use
the ON COLUMNS. A column group is a parenthesized
comma-separated list of columns for which you want to collect combined
statistics.
The column and column groups are specified as a parenthesized
comma-separated list.
Running
the
RUNSTATS command on a table without gathering
index statistics but specifying a subset of columns for which statistics
are to be gathered has the following effects:
- Statistics are not reset for columns that you do not specify for
the RUNSTATS command but are the first column in
an index.
- Statistics are reset for all other columns that you do not specify
for the RUNSTATS command.
You can use the ON COLUMNS parameter
in the On Cols clause and the On Dist Cols clause.
Collecting distribution statistics for a group of columns is not currently
supported.
If you specify XML-type columns in
a column group, the XML-type columns are ignored for collecting distinct
values for the group. However, basic XML column statistics are collected
for the XML-type columns in the column group.
- ON KEY COLUMNS
- Collects
statistics on columns that make up all the indexes that are defined
on the table. It is assumed that critical columns in queries are also
those that are used to create indexes on the table. If there are no
indexes on the table, no column statistics are collected.
You can
use the ON KEY COLUMNS parameter in the On
Cols clause or the On Dist Cols clause.
Specifying the parameter in both clauses is redundant because if you
specify the On Dist Cols clause (after the WITH
DISTRIBUTION parameter), both basic and distribution statistics
are collected.
XML-type columns are by definition not key columns
and are not included for statistics collection by the ON
KEY COLUMNS parameter. You cannot use this parameter for
views.
- column-name
- Name of a column in the table or statistical view. If you specify
the name of an ineligible column for statistics collection, such as
a nonexistent column or a mistyped column name, error (-205) is returned.
Two lists of columns can be specified, one without distribution and
one with distribution. If the column is specified in the list that
is not associated with the WITH DISTRIBUTION clause
only basic column statistics will be collected. If the column appears
in both lists, distribution statistics will be collected (unless NUM_FREQVALUES and NUM_QUANTILES are
set to zero).
- LIKE STATISTICS
- Collects
additional column statistics for columns of type CHAR or VARCHAR with
a code page attribute of single-byte character set (SBCS), FOR BIT
DATA, or UTF-8. The statistics are collected if the runstats utility
determines that such statistics are appropriate after analyzing column
values. These statistics are shown in the SUB_COUNT and the SUB_DELIM_LENGTH
columns in the SYSSTAT.COLUMNS views. The query optimizer uses these
statistics to improve the selectivity estimates for predicates of
the type "column LIKE '%xyz'" and "column
LIKE '%xyz%'".
- WITH DISTRIBUTION
- Collects
both basic statistics and distribution statistics on columns. If you
do not specify the ON COLUMNS parameter, distribution
statistics are collected on all the columns of the table or statistical
view, excluding columns that are ineligible such as those of type
CLOB and LONG VARCHAR. If you specify the ON COLUMNS parameter,
distribution statistics are collected only on the column list that
you provide, excluding those columns that are ineligible for statistics
collection.
If you specify the WITH DISTRIBUTION parameter
followed by the ON COLUMNS parameter with column
groups, distribution statistics are not collected for the column groups.
- DEFAULT
- If
you specify the NUM_FREQVALUES and NUM_QUANTILES parameters,
the values of the parameters are used to determine the maximum number
of frequency and quantile statistics to be collected for the columns.
The NUM_FREQVALUES and NUM_QUANTILES parameters
are used if you do not specify values for individual columns in the ON
COLUMNS clause. If you do not specify the DEFAULT parameter,
the values used will be those in the corresponding database configuration
parameters.
- NUM_FREQVALUES
- Defines
the DEFAULT parameter referred to frequency statistics.
You can specify this parameter for an individual column after the ON
COLUMNS parameter. If you instead specify the NUM_FREQVALUES parameter
after the DEFAULT parameter, the value for the NUM_FREQVALUES parameter
after the DEFAULT parameter is used. If the specified
value is '-1' or you do not specify the NUM_FREQVALUES parameter
after either the ON COLUMNS or DEFAULT parameter,
the maximum number of frequency values is the value of the num_freqvalues database
configuration parameter.
- NUM_QUANTILES
- Defines
the maximum number of distribution quantile values to collect. It
can be specified for an individual column in the ON COLUMNS clause.
If the value is either not specified or is specified as '-1' for
an individual column, the quantile limit value is picked up from that
specified in the DEFAULT clause. If it is not
specified there either, the maximum number of quantile values to be
collected will be what is set in the num_quantiles database
configuration parameter.
For DB2 V9.7
Fix Pack 1 and later releases, distribution statistics for each index
over XML data uses a maximum of 250 quantiles as the default. The
default can be changed by specifying the NUM_QUANTILES parameter
in the ON COLUMNS or the DEFAULT clause.
The num_quantiles database configuration parameter
is ignored while collecting XML distribution statistics.
- ALLOW WRITE ACCESS
- Specifies that other users can read from and write to the tables
while statistics are calculated. For statistical views, these are
the base tables referenced in the view definition.
The ALLOW
WRITE ACCESS option is not recommended for tables that
will have a lot of inserts, updates or deletes occurring concurrently.
The RUNSTATS command first collects table statistics
and then performs index statistics. Changes in the table's state between
the time that the table and index statistics are collected might result
in inconsistencies. Although having up-to-date statistics is important
for the optimization of queries, it is also important to have consistent
statistics. Therefore, statistics should be collected at a time when
inserts, updates or deletes are at a minimum.
- ALLOW READ ACCESS
- Specifies that other users can have read-only access to the tables
while statistics are calculated. For statistical views, these are
the base tables referenced in the view definition.
- TABLESAMPLE BERNOULLI
- Collects
statistics on a sample of the rows from the table or statistical view. Bernoulli
sampling considers each row individually, including the row
with probability P/100 (where P is
the value of the numeric-literal parameter) and
excluding it with probability 1-P/100. Thus, if
the value of the numeric-literal parameter is evaluated
to be the value 10, representing a 10% sample, each
row would be included with probability 0.1 and be excluded with probability
0.9. Unless you specify the optional REPEATABLE parameter,
each execution of the RUNSTATS command usually
yields a different sample of the table. All data pages are retrieved
through a table scan, but only the percentage of rows that you specify
by using the numeric-literal parameter is used
for statistics collection.
- TABLESAMPLE SYSTEM
- Collects
statistics on a sample of the data pages from the tables. System
sampling considers each page individually, including the page
with probability P/100 (where P is
the value of the numeric-literal parameter) and
excluding it with probability 1-P/100. Unless you
specify the optional REPEATABLE parameter, each
execution of the RUNSTATS command usually yields
a different sample of the table. You control the size of the sample
by specifying the numeric-literal parameter in
parentheses, which represents an approximate percentage P of
the table to return. Only a percentage of the data pages, as specified
by the numeric-literal parameter, is retrieved
and used for statistics collection.
For a statistical view, you can apply
system sampling to only a single base table that is referenced in
the view definition. If the view contains multiple tables, system
sampling is possible if a single table in the statistical view can
be identified as being joined with all primary keys or unique index
columns of the other tables that are used in the view. If the statistical
view does not meet those conditions, Bernoulli sampling is be used
instead, and a warning is returned.
- numeric-literal
- Specifies
the size of the sample to be obtained, as a percentage P.
This value must be a positive number that is less than or equal to 100 and
can be from 0 - 1. For example,
a value of 0.01 represents one one-hundredth of a
percent, such that one row in 10,000 is sampled, on average. A value
of 0 or 100 is treated as if you
did not specify sampling, regardless of whether you specified the TABLESAMPLE
BERNOULLI or TABLESAMPLE SYSTEM parameter.
A value greater than 100 or less than 0 is
treated as an error (SQL1197N).
- REPEATABLE (integer-literal)
- When
specified after the TABLESAMPLE parameter, ensures
that repeated executions of the RUNSTATS command
return the same sample. The integer-literal parameter
specifies a non-negative integer representing the seed to be used
in sampling. Passing a negative seed results in an error (SQL1197N).
The sample set might vary between repeatable RUNSTATS command
invocations if activity against the table or statistical view resulted
in changes to the table or statistical view data since the last time
that you ran the command with the TABLESAMPLE REPEATABLE parameter.
Also, to ensure consistent results, the method by which you obtained
the sample (using the BERNOULLI or SYSTEM parameter)
must be the same.
- INDEXSAMPLE BERNOULLI
- Collects
index statistics on a sample of the rows in the index. Bernoulli
sampling considers each row individually, including the row
with probability P/100 (where P is
the value of the numeric-literal parameter) and
excluding it with probability 1-P/100. Thus, if
the numeric-literal parameter has the value 10,
representing 10%, each row would be included with probability 0.1
and be excluded with probability 0.9. Each execution of the RUNSTATS command
is likely to yield a different sample of the index. All index pages
are retrieved through an index scan, but only the percentage of rows
as specified through the numeric-literal parameter
is used for statistics collection. The INDEXSAMPLE BERNOULLI parameter
is not supported for statistical views.
- INDEXSAMPLE SYSTEM
- Collects
statistics on a sample of the index pages. System sampling considers
each page individually, including the page with probability P/100
(where P is the value of the numeric-literal parameter)
and excluding it with probability 1-P/100. Each
execution of the RUNSTATS command usually yields
a different sample of the index. You control the size of the sample
by specifying the numeric-literal parameter in
parentheses, which represents an approximate percentage P of
the index to return. Only a percentage of the index pages, as specified
by the numeric-literal parameter, is retrieved
and used for statistics collection. The INDEXSAMPLE SYSTEM parameter
is not supported for statistical views.
- SET PROFILE NONE
- Specifies that no statistics profile will be set for this RUNSTATS invocation.
- SET PROFILE
- Generates
and stores a specific statistics profile in the system catalog tables
and executes the RUNSTATS command options to gather
statistics.
- SET PROFILE ONLY
- Generates
and stores a specific statistics profile in the system catalog tables
without running the RUNSTATS command options.
- UPDATE PROFILE
- Modifies
a statistics profile in the system catalog tables and runs the RUNSTATS command
options of the updated statistics profile to gather statistics. You
cannot use the UPDATE PROFILE parameter to remove
clauses that are in a statistics profile.
- UPDATE PROFILE ONLY
- Modifies
a statistics profile in the system catalog tables without running
the RUNSTATS command options of the updated statistics
profile. You cannot use the UPDATE PROFILE ONLY parameter
to remove clauses that are in a statistics profile.
- UTIL_IMPACT_PRIORITY priority
- Specifies that RUNSTATS will be throttled at
the level specified by priority. priority is
a number in the range of 1 to 100,
with 100 representing the highest priority and 1 representing
the lowest. The priority specifies the amount of throttling to which
the utility is subjected. All utilities at the same priority undergo
the same amount of throttling, and utilities at lower priorities are
throttled more than those at higher priorities. If priority is
not specified, the RUNSTATS will have the default
priority of 50. Omitting the UTIL_IMPACT_PRIORITY keyword
will invoke the RUNSTATS utility without throttling
support. If the UTIL_IMPACT_PRIORITY keyword
is specified, but the util_impact_lim configuration
parameter is set to 100, then the utility will run
unthrottled.
When
you use the RUNSTATS command on tables in a partitioned
database, statistics are collected on only a single database partition.
If the database partition from which you ran the RUNSTATS command
has a partition of the table, the command runs on that database partition.
Otherwise, the command runs on the first database partition in the
database partition group across which the table is partitioned..
Examples
- The following command collects basic statistics (no distribution
statistics) on the table only, on all columns:
RUNSTATS ON TABLE employee
- The
following command collects statistics on the table only, with distribution
statistics on columns empid and empname:
RUNSTATS ON TABLE employee
WITH DISTRIBUTION ON COLUMNS (empid, empname)
- The following command collects basic and distribution statistics
on the table only, on all columns. The command specifies a maximum
number of frequency values and uses the maximum number of distribution
quantile values from the configuration setting.
RUNSTATS ON TABLE employee WITH DISTRIBUTION DEFAULT
NUM_FREQVALUES 50
- The following
command collects basic and distribution statistics on the table only,
on all columns and on column group JOB, WORKDEPT, and SEX:
RUNSTATS ON TABLE employee ON ALL COLUMNS AND COLUMNS ((JOB, WORKDEPT, SEX))
WITH DISTRIBUTION
- The following command collects statistics on a set of indexes:
RUNSTATS ON TABLE employee for indexes empl1, empl2
- The following command collects statistics on all indexes only:
RUNSTATS ON TABLE employee FOR INDEXES ALL
- The following command collects basic statistics on the table and
all indexes, using sampling for the detailed index statistics collection:
RUNSTATS ON TABLE employee AND SAMPLED DETAILED INDEXES ALL
This
command is equivalent to the following one:RUNSTATS ON TABLE employee AND DETAILED INDEXES ALL
- The following command collects basic statistics on the table,
with distribution statistics on columns empid, empname, and empdept
and the two indexes Xempid and Xempname. Distribution statistics limits
are set individually for empdept, and the other two columns use a
common default.
RUNSTATS ON TABLE employee
WITH DISTRIBUTION ON COLUMNS (empid, empname, empdept NUM_FREQVALUES
50 NUM_QUANTILES 100)
DEFAULT NUM_FREQVALUES 5 NUM_QUANTILES 10
AND INDEXES Xempid, Xempname
- The following command collects statistics on all columns that
are used in indexes and on all indexes:
RUNSTATS ON TABLE employee ON KEY COLUMNS AND INDEXES ALL
- The
following command collects statistics on all indexes and all columns
with distribution and distribution statistics set individually for
one of the columns. In this example, T1 contains columns c1, c2, ....,
c8
RUNSTATS ON TABLE T1
WITH DISTRIBUTION ON COLUMNS (c1, c2, c3 NUM_FREQVALUES 20
NUM_QUANTILES 40, c4, c5, c6, c7, c8)
DEFAULT NUM_FREQVALUES 0, NUM_QUANTILES 0 AND INDEXES ALL
RUNSTATS ON TABLE T1
WITH DISTRIBUTION ON COLUMNS (c3 NUM_FREQVALUES 20 NUM_QUANTILES 40)
AND INDEXES ALL
- The
following command collects statistics on table T1 for the individual
columns c1 and c5 and for the column groups (c2, c3) and (c2, c4).
Multicolumn cardinality is very useful to the query optimizer when
it estimates filter factors for predicates on columns in which the
data is correlated.
RUNSTATS ON TABLE T1 ON COLUMNS (c1, (c2, c3),
(c2, c4), c5)
- The following command collects statistics on table T1 for the
individual columns c1 and c2. For column c1, the command also collects
LIKE predicate statistics.
RUNSTATS ON TABLE T1 ON COLUMNS (c1 LIKE STATISTICS, c2)
- The following command registers a statistics profile to collect
basic and distribution statistics on the table only, on all columns.
The command specifies a maximum number of frequency values and uses
the maximum number of distribution quantile values from the configuration
setting.
RUNSTATS ON TABLE employee WITH DISTRIBUTION DEFAULT
NUM_FREQVALUES 50 SET PROFILE
- The following command registers a statistics profile to collect
basic and distribution statistics on the table only, on all columns.
The command specifies a maximum number of frequency values and uses
the maximum number of distribution quantile values from the configuration
setting. Statistics are not collected.
RUNSTATS ON TABLE employee WITH DISTRIBUTION
DEFAULT NUM_FREQVALUES 50 SET PROFILE ONLY
- The following command modifies the previously registered statistics
profile by changing the maximum number of frequency values from 50
to 30. The command also updates the statistics as specified.
RUNSTATS ON TABLE employee WITH DISTRIBUTION
DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE
- The following command modifies the previously registered statistics
profile by changing the maximum number of frequency values from 50
to 30. Statistics are not collected.
RUNSTATS ON TABLE employee WITH DISTRIBUTION
DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE ONLY
- The following command modifies the previously registered statistics
profile by adding column empl_address and column group (empl_title,
empl_salary) options. The command also updates the statistics as specified.
RUNSTATS ON TABLE employee
ON COLUMNS (empl_address, (empl_title, empl_salary))
UPDATE PROFILE
- The following command modifies the previously registered statistics
profile by adding column empl_address and column group (empl_title,
empl_salary) options. Statistics are not collected.
RUNSTATS ON TABLE employee
ON COLUMNS (empl_address, (empl_title, empl_salary))
UPDATE PROFILE ONLY
- The following command collects statistics on a table by using
the options that are recorded in the statistics profile for that table:
RUNSTATS ON TABLE employee USE PROFILE
- The following command queries the RUNSTATS command
options corresponding to the previously registered statistics profile
in the catalogs of the table:
SELECT STATISTICS_PROFILE FROM SYSCAT.TABLES WHERE TABNAME =
'EMPLOYEE'
- The following command collects statistics, including distribution
statistics, on 30% of the rows:
RUNSTATS ON TABLE employee WITH DISTRIBUTION
TABLESAMPLE BERNOULLI(30)
- To control the sample set on which statistics will be collected
and to be able to repeatedly use the same sample set, you can do so
as follows:
RUNSTATS ON TABLE employee WITH DISTRIBUTION
TABLESAMPLE BERNOULLI(30) REPEATABLE(4196)
The command
results in the same set of statistics if the data has not changed
in the interim.
- The following command collects index statistics and table statistics
on 1.5% of the data pages. Only table data pages, not index pages,
are sampled. In this example, 1.5% of table data pages are used for
the collection of table statistics, and all index pages are used for
index statistics.
RUNSTATS ON TABLE employee AND INDEXES ALL TABLESAMPLE SYSTEM(1.5)
- The following command collects table statistics on 1.5% of the
data pages and index statistics on 2.5% of the index pages. Both table
data pages and index pages are sampled.
RUNSTATS ON TABLE employee AND INDEXES ALL TAMBLESAMPLE SYSTEM(1.5)
INDEXSAMPLE SYSTEM(2.5)
- The following command collects statistics for a statistical view,
on all columns, without distribution statistics:
RUNSTATS ON VIEW product_sales_view
- The following command collects basic statistics for a statistical
view, with distribution statistics on the category, type and product_key
columns. Distribution statistics limits are set for the category column,
and the other columns use a common default.
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION ON COLUMNS (category NUM_FREQVALUES 100 NUM_QUANTILES 100,
type, product_key) DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
- The following command collects basic and distribution statistics
for a statistical view on 10% of the rows, using row-level sampling:
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION TABLESAMPLE BERNOULLI (10)
- The following command collects basic and distribution statistics
for a statistical view on 2.5% of the rows, using data-page-level
sampling. Additionally, this command specifies the repeated use of
the same sample set. For this command to succeed, the query must be
such that the DB2 database system
can successfully push data-page sampling down to one or more tables.
Otherwise, an error (SQL 20288N) is issued.
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION TABLESAMPLE SYSTEM (2.5) REPEATABLE(4196)
- The following command registers a statistics profile to collect
statistics on the view and on all columns with distribution statistics
as specified:
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
SET PROFILE
- The following command modifies the previously registered statistics
profile. This command also updates the statistics as specified:
RUNSTATS ON VIEW salesdb.product_sales_view
WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 25 NUM_QUANTILES 25
UPDATE PROFILE
Usage notes
- You should run the RUNSTATS command in the
following cases:
- On tables that were modified considerably: for example, if a large
number of updates were made, if a significant amount of data was inserted
or deleted, or if you ran the LOAD command without
the statistics option during LOAD.
- On tables that were reorganized by using the REORG or REDISTRIBUTE
DATABASE PARTITION GROUP command.
- On tables that were row compressed.
- After creating a new index.
- Before binding applications whose performance is critical.
- When the prefetch quantity is changed.
- On statistical views whose underlying tables that were modified
substantially so as to change the rows that are returned by the view.
- After running the LOAD command with the STATISTICS option,
if there are XML columns. Use the RUNSTATS utility
to collect statistics on XML columns, because the LOAD command
does not collect statistics on these columns, even if you use the STATISTICS option.
When you use the RUNSTATS command to collect statistics
for XML columns only, statistics for non-XML columns that were collected
by the LOAD command or a previous execution of
the RUNSTATS utility are retained. If you previously
collected statistics on an XML column, those statistics are either
dropped if the current command does not collect statistics on that
column or are replaced if the current command does collect statistics
on that column.
- The options that you choose depend on the specific table and the
application. In general, the following guidelines apply:
- If the table is a very critical table in critical queries, is
relatively small, or does not change too much and there is not too
much activity on the system itself, it might be worth spending the
effort on collecting statistics in as much detail as possible.
- If the time to collect statistics is limited, if the table is
relatively large, or if the table is updated frequently, it might
be beneficial to run the RUNSTATS command on just
the set of columns that are used in predicates. This way, you can
run the RUNSTATS command more often.
- If time to collect statistics is very limited and the effort to
tailor the RUNSTATS command on a table-by-table
basis is a major issue, consider collecting statistics for the KEY
columns only. It is assumed that the index contains the set of columns
that are critical to the table and are most likely to appear in predicates.
- If time to collect statistics is very limited and table statistics
are to be gathered, consider using the TABLESAMPLE option
to collect statistics on a subset of the table data.
- If
time to collect statistics is very limited and index statistics are
to be gathered, consider using the INDEXSAMPLE option
to collect statistics on a subset of the index data.
- If there is skew in certain columns and predicates of the type "column
= constant", it might be beneficial to specify a larger NUM_FREQVALUES value
for the columns.
- Collect distribution statistics for all columns that you use in
equality predicates and for which the distribution of values might
be skewed.
- For columns that have range predicates (for example "column
>= constant" or "column BETWEEN constant1 AND constant2")
or are of the type "column LIKE '%xyz'", it might
be beneficial to specify a larger NUM_QUANTILES value.
- If storage space is a concern and you do not have much time to
collect statistics, do not specify high NUM_FREQVALUES or NUM_QUANTILES values
for columns that you do not use in predicates.
- If you need index statistics and statistics have never been collected
on the table containing the index, statistics on both the table and
indexes are calculated.
- If you do not require statistics for XML columns in the table,
you can use the EXCLUDING XML COLUMNS parameter
to exclude all XML columns. This parameter takes precedence over all
other parameters that specify XML columns for statistics collection.
- After the command is run, note the following:
- To release the locks, you should issue a COMMIT.
- To allow new access plans to be generated, you must rebind the
packages that reference the target table.
- Executing the command on portions of the table can result in inconsistencies
as a result of activity on the table since you last issued the command.
In this case, a warning message is returned.
Issuing the RUNSTATS command
on the table only might make table-level and index-level statistics
inconsistent. For example, you might collect index-level statistics
on a table and later delete a significant number of rows from the
table. If you then issue the RUNSTATS command on
the table only, the table cardinality might be less than the value
of FIRSTKEYCARD, which is an inconsistency. Similarly, if you collect
statistics on a new index when you create it, the table-level statistics
might be inconsistent.
- The RUNSTATS command drops previously collected
distribution statistics if you request table statistics. For example,
the RUNSTATS ON TABLE and RUNSTATS ON
TABLE … AND INDEXES ALL commands cause previously collected
distribution statistics to be dropped. If you run the command on indexes
only, previously collected distribution statistics are retained. For
example, the RUNSTATS ON TABLE … FOR INDEXES ALL command
causes the previously collected distribution statistics to be retained.
If you run the RUNSTATS command on XML columns
only, previously collected basic column statistics and distribution
statistics are retained. If you previously collected statistics on
an XML column, those statistics are either dropped if the current
command does not collect statistics on that column or are replaced
if the current command does collects statistics on that column.
- When there are detached partitions on a partitioned table, index
keys that still belong to detached data partitions which require cleanup
will not be counted as part of the keys in the statistics. These keys
are not counted because they are invisible and no longer part of the
table. They will eventually get removed from the index by asynchronous
index cleanup. As a result, statistics collected before asynchronous
index cleanup is run will be misleading. If the RUNSTATS command
is issued before asynchronous index cleanup completes, it will likely
generate a false alarm for index reorganization or index cleanup based
on the inaccurate statistics. Once asynchronous index cleanup is run,
all the index keys that still belong to detached data partitions which
require cleanup will be removed and this may eliminate the need for
index reorganization.
For partitioned tables, you are encouraged to issue
the RUNSTATS command after an asynchronous index
cleanup has completed in order to generate accurate index statistics
in the presence of detached data partitions. To determine whether
or not there are detached data partitions in the table, you can check
the status field in the SYSCAT.DATAPARTITIONS catalog view and look
for the value L (logically detached), I (index
cleanup), or D (detached with dependent MQT).
The RUNSTATS command collects
statistics for all index partitions of a partitioned index. Statistics
in the SYSSTAT.INDEXES view for the partitioned index represent an
index partition, except for FIRSTKEYCARD, FIRST2KEYCARD, FIRST3KEYCARD,
FIRST4KEYCARD, and FULLKEYCARD statistics. Because these statistics
are used in cardinality estimates, they are for the entire index and
not for an index partition. Distribution statistics (frequent values
and quantiles) are not collected for partitioned indexes, but are
gathered if RUNSTATS is run on the table. Statistics
on the leading columns of a partitioned index might not be as accurate
as statistics on the leading columns of a nonpartitioned index.
- When
the RUNSTATS command is run on a table with an
expression-based index, and the RUNSTATS command
includes that index in the AND INDEXES or FOR
INDEXES clause, statistics for expression-based key columns
within that index will also be collected and will be associated with
the system-generated statistical view associated with the index. Expressions
cannot be specified as columns in the RUNSTATS command
(SQL0205N). To collect customized statistics on expression-based index
key columns, you can define a statistics profile on the statistical
view that is associated with the index using the column names as they
appear in the statistics view. You also need to define a statistics
profile on the base table including the index in its INDEXES clause.
Note: When
you define a statistics profile on the expression based index's statistical
view, an automatically generated statistics profile is also associated
with the base table, if one does not exist already. Once the statistics
profiles have been defined, a RUNSTATS command
on the base table with the USE PROFILE clause
will result in the customized statistics being gathered on the expression-based
key columns in the index.
- Distribution statistics are collected on indexes
over XML data that are defined on an XML column. If you run the RUNSTATS command
on a table with the WITH DISTRIBUTION parameter,
collection of distribution statistics on a column of type XML occurs
as follows:
- The RUNSTATS command must collect both distribution
statistics and table statistics to collect distribution statistics
for indexes over XML data defined on an XML column. Table statistics
must be gathered in order for distribution statistics to be collected
since XML distribution statistics are stored with table statistics.
An
index clause is not required to collect XML distribution statistics.
Specifying only an index clause does not collect XML distribution
statistics.
By default, XML distribution statistics use a maximum
of 250 quantiles for each index over XML data. When collecting distribution
statistics on an XML column, you can change the maximum number of
quantiles by specifying a value for the NUM_QUANTILES parameter
in the ON COLUMNS or the DEFAULT clause.
- Distribution statistics are collected for indexes over XML data
of type VARCHAR, DOUBLE, TIMESTAMP, and DATE. Distribution statistics
are not collected over indexes of type VARCHAR HASHED.
- Distribution statistics are not collected for partitioned indexes
over XML data defined on a partitioned table.
- A special system-generated index in the catalog tables represents
the range-ordering property of range-clustered tables. When you collect
statistics on this type of table, statistics are also collected for
the system-generated index. The statistics reflect the fast access
of the range lookups by representing the index as a two-level index
with as many pages as the base data table and by having the base data
clustered perfectly along the index order.
- In the On Dist Cols clause of the command, the
Frequency Option and Quantile Option parameters are not supported
for column groups. These parameters are supported for single columns.
- Three types of prefetch statistics cannot be computed in DMS mode.
In the index statistics in the index catalogs, you will see a -1 value
for the following statistics:
- AVERAGE_SEQUENCE_FETCH_PAGES
- AVERAGE_SEQUENCE_FETCH_GAP
- AVERAGE_RANDOM_FETCH_PAGES
- The statistics profile is stored in a visible string format, which
represents the RUNSTATS command, in the STATISTICS_PROFILE
column of the SYSCAT.TABLES system catalog table.
- Statistics collection on XML-type columns is governed by two DB2 registry variables: DB2_XML_RUNSTATS_PATHID_K and DB2_XML_RUNSTATS_PATHVALUE_K.
These two registry variables are similar to the NUM_FREQVALUES parameter
in that they specify the number of frequency values to collect. If
you do not set the parameters, a default of 200 is
used.
- When you start running the RUNSTATS command,
it acquires an IX table lock on the SYSTABLES table and a U lock on
the row of the table on which you are gathering statistics. Operations
can still read from the SYSTABLES table, including the row in the
table with the U lock. Write operations are also possible if they
do not occur against the row with the U lock. However, another reader
or writer cannot acquire an S lock on the SYSTABLES table because
of the IX lock that the RUNSTATS command acquired.
- Statistics are not collected for columns with
structured types. If they are specified, columns with these data types
are ignored.
- Only AVGCOLLEN and NUMNULLS are collected for
columns with LOB or LONG data types. AVGCOLLEN represents the average
space in bytes when the column is stored in database memory or a temporary
table. This value represents the length of the data descriptor for
LOB or LONG data types, except when LOB data is inlined on the data
page. The average space required to store the column on disk may be
different than the value represented by this statistic.
- The UNSAMPLED
DETAILED option is available to change the way index statistics
are collected, but it should be used only in cases where its clear
that the default or DETAILED doesn't work.
- When
you use the INDEXSAMPLE parameter, you cannot
specify different index sampling rates for different indexes within
a single command. For example, the following command is invalid:
runstats on table orders and index o_ck indexsample system(5),
index o_ok indexsample system(10)
You can use the following
two RUNSTATS commands to achieve the required result:runstats on table orders and index o_ck indexsample system(5)
runstats on table orders for index o_ok indexsample system(10)
- If
you have modified the table since statistics were last collected on
the table or its indexes, you should run RUNSTATS ON TABLE
… AND INDEXES ALL. If you use RUNSTATS ON
TABLE … FOR INDEXES ALL, the resulting statistics might
be inconsistent.