Modifying catalog statistics to influence access path selection

If you have the proper authority, you can influence access path selection by using an SQL statements to change statistics values in the catalog. However, doing so is not generally recommended, except as a last resort.

Before you begin

Begin program-specific programming interface information.
Important: Start of changeUse care when issuing SQL statements or using tools to update statistics values in catalog tables. If such updates introduce invalid data, unpredictable results can occur, including abends for RUNSTATS and other utilities. If such problems occur, you can run the RUNSTATS utility and collect statistics at the table space level to resolve the problems, in most cases.End of change
Important: The access path selection techniques that are described here might cause significant performance degradation if they are not carefully implemented and monitored. Also, access path selection methods might change in a later release of DB2®, causing your changes to degrade performance.

Consequently, the following recommendations apply if you make any such changes:

  • Save the original catalog statistics or SQL statements before you consider making any changes to control the choice of access path.
  • Before and after you make any changes, take performance measurements.
  • Be prepared to back out any changes that have degraded performance.
  • When you migrate to a new release, evaluate the performance again.
  • Plan to keep track of the changes you make and of the plans or packages that have access path changes because of the changed statistics.
  • Record when the statistics were modified by setting the value of the STATIME column to the current TIMESTAMP value for every statistics record that you modify in the catalog tables.
  • Consider correlations among catalog tables before updating statistics values.

About this task

You might modify catalog statistics to influence how DB2 selects access paths. However, the access path selection "tricks" that are described here cause significant performance degradation if they are not carefully implemented and monitored. Although modifying catalog statistics might improve the access path for one or a few SQL statements, other statements in the workload might by affected by the same changes. Also, the updates to the catalog must be repeated whenever the RUNSTATS utility resets the catalog values.

Procedure

To modify statistics values, use any of the following approaches:

  • Issue UPDATE statements to change the values in the catalog. Numeric, date time, values in the catalog tables use internal hexadecimal formats. You must also allow for null indicators in keys that allow null values.

    Start of changeSimilarly, the padding characteristics of values in columns such as HIGH2KEY and LOW2KEY, that contain data values must match the padding characteristics of the corresponding data columns. You can query the STATS_FORMAT column of SYSIBM.SYSCOLUMNS catalog table to determine whether variable-length columns contain padded values.End of change

  • Update small COLCARDF values for partitioned data. On partitioned indexes, the RUNSTATS INDEX utility calculates the number of distinct column values and saves it in the SYSCOLSTATS.COLCARD, by partition. When statistics by partition are used to form an aggregate COLCARDF value, the aggregate value might not be exact because some column values might occur in more than one partition. DB2 cannot detect that overlap without scanning all parts of the index. The overlap never skews COLCARD by more than the number of partitions, which is usually not a problem for large values.

    The same problem and solution also applies to the FIRSTKEYCARDF columns of the SYSIBM.SYSINDEXES and SYSIBM.SYSINDEXSTATS catalog tables.

  • If you update the COLCARDF value for a column, also update HIGH2KEY and LOW2KEY for the values.
  • You can insert, update, or delete distribution information for any column in the following catalog tables that contain distributions statistics:
    • SYSIBM.SYSCOLDIST
    • SYSIBM.SYSCOLDISTSTATS
    • SYSIBM.SYSKEYTGTDISTSTATS
    • SYSIBM.SYSKEYTGTDIST
  • If you use dynamic statement caching, invalidate statements in the cache that involve the table spaces or indexes whose statistics you have modified. To invalidate statements in the dynamic statement cache without updating catalog statistics or generating reports, you can run the RUNSTATS utility with the REPORT NO and UPDATE NONE options on the table space or the index that the query is dependent on.

Example

For example, assume that the following SQL statement has a problem with data correlation:

SELECT * FROM PART_HISTORY   --  SELECT ALL PARTS
WHERE PART_TYPE = 'BB'    P1 --  THAT ARE 'BB' TYPES
  AND W_FROM = 3          P2 --  THAT WERE MADE IN CENTER 3
  AND W_NOW = 3           P3 --  AND ARE STILL IN CENTER 3

DB2 does not know that 50% of the parts that were made in Center 3 are still in Center 3. The problem can be circumvented by making a predicate non-indexable. However, suppose that hundreds of users are writing queries similar to that query. Having all users change their queries would be impossible. In this type of situation, the best solution might be to change the catalog statistics.

One catalog table that you can update is the SYSIBM.SYSCOLDIST catalog table, which gives information about a column or set of columns in a table. Assume that because columns W_NOW and W_FROM are correlated, and that only 100 distinct values exist for the combination of the two columns, rather than 2500 (50 for W_FROM * 50 for W_NOW). Insert a row like this to indicate the new cardinality:

INSERT INTO SYSIBM.SYSCOLDIST
       (FREQUENCY, FREQUENCYF, IBMREQD,
        TBOWNER, TBNAME, NAME, COLVALUE,
        TYPE, CARDF, COLGROUPCOLNO, NUMCOLUMNS)
 VALUES(0, -1, 'N',
        'USRT001','PART_HISTORY','W_FROM',' ',
        'C',100,X'00040003',2);

You can also use the RUNSTATS utility to put this information in SYSCOLDIST.

You tell DB2 about the frequency of a certain combination of column values by updating SYSIBM.SYSCOLDIST. For example, you can indicate that 1% of the rows in PART_HISTORY contain the values 3 for W_FROM and 3 for W_NOW by inserting this row into SYSCOLDIST:

INSERT INTO SYSIBM.SYSCOLDIST
       (FREQUENCY, FREQUENCYF, STATSTIME, IBMREQD,
        TBOWNER, TBNAME, NAME, COLVALUE,
        TYPE, CARDF, COLGROUPCOLNO, NUMCOLUMNS)
 VALUES(0, .0100, '2006-12-01-12.00.00.000000','N',
        'USRT001','PART_HISTORY','W_FROM',X'00800000030080000003',
        'F',-1,X'00040003',2);

End program-specific programming interface information.