DB2 Version 9.7 for Linux, UNIX, and Windows

Collecting distribution statistics for specific columns

For efficient runstats operations and subsequent query-plan analysis, collect distribution statistics on only those columns that queries reference in WHERE, GROUP BY, and similar clauses. You can also collect cardinality statistics on combined groups of columns. The optimizer uses such information to detect column correlation when it estimates selectivity for queries that reference the columns in a group.

About this task

The following example is based on a database named SALES that contains a CUSTOMERS table with indexes CUSTIDX1 and CUSTIDX2.

For privileges and authorities that are required to use the runstats utility, see the description of the RUNSTATS command.

When you collect statistics for a table in a partitioned database environment, runstats only operates on the database partition from which the utility is executed. The results from this database partition are extrapolated to the other database partitions. If this database partition does not contain a required portion of the table, the request is sent to the first database partition in the database partition group that contains the required data.

To collect statistics on specific columns:

Procedure

  1. Connect to the SALES database.
  2. Execute one of the following commands from the DB2® command line, depending on your requirements:
    • To collect distribution statistics on columns ZIP and YTDTOTAL:
         runstats on table sales.customers
           with distribution on columns (zip, ytdtotal)
    • To collect distribution statistics on the same columns, but with different distribution options:
         runstats on table sales.customers
           with distribution on columns (
             zip, ytdtotal num_freqvalues 50 num_quantiles 75)
    • To collect distribution statistics on the columns that are indexed in CUSTIDX1 and CUSTIDX2:
         runstats on table sales.customer
           on key columns
    • To collect statistics for columns ZIP and YTDTOTAL and a column group that includes REGION and TERRITORY:
         runstats on table sales.customers
           on columns (zip, (region, territory), ytdtotal)
    • Suppose that statistics for non-XML columns have been collected previously using the LOAD command with the STATISTICS option. To collect statistics for the XML column MISCINFO:
         runstats on table sales.customers
           on columns (miscinfo)
    • To collect statistics for the non-XML columns only:
         runstats on table sales.customers
           excluding xml columns
      The EXCLUDING XML COLUMNS clause takes precedence over all other clauses that specify XML columns.
    • For DB2 V9.7 Fix Pack 1 and later releases, the following command collects distribution statistics using a maximum of 50 quantiles for the XML column MISCINFO. A default of 20 quantiles is used for all other columns in the table:
       runstats on table sales.customers
          with distribution on columns ( miscinfo num_quantiles 50 )
          default num_quantiles 20
      Note: The following are required for distribution statistics to be collected on the XML column MISCINFO:
      • Both table and distribution statistics must be collected.
      • An index over XML data must be defined on the column, and the data type specified for the index must be VARCHAR, DOUBLE, TIMESTAMP, or DATE.