DB2 10.5 for Linux, UNIX, and Windows

Statistical views

The DB2® cost-based optimizer uses an estimate of the number of rows processed by an access plan operator to accurately cost that operator. This cardinality estimate is the single most important input to the optimizer cost model, and its accuracy largely depends upon the statistics that the RUNSTATS command collects from the database.

More sophisticated statistics are required to represent more complex relationships, such as the following relationships: Statistical views are able to represent these types of complex relationships, because statistics are collected on the result set returned by the view, rather than the base tables referenced by the view.

When a query is compiled, the optimizer matches the query to the available statistical views. When the optimizer computes cardinality estimates for intermediate result sets, it uses the statistics from the view to compute a better estimate.

Queries do not need to reference the statistical view directly in order for the optimizer to use the statistical view. The optimizer uses the same matching mechanism that is used for materialized query tables (MQTs) to match queries to statistical views. In this respect, statistical views are similar to MQTs, except that they are not stored permanently, do not consume disk space, and do not have to be maintained.

A statistical view is created by first creating a view and then enabling it for optimization by using the ALTER VIEW statement. The RUNSTATS command is then run against the statistical view, populating the system catalog tables with statistics for the view. For example, to create a statistical view that represents the join between the TIME dimension table and the fact table in a star schema, run the following statements and commands:
CREATE VIEW SV_TIME_FACT AS (
  SELECT T.* FROM TIME T, SALES S
    WHERE T.TIME_KEY = S.TIME_KEY)

ALTER VIEW SV_TIME_FACT ENABLE QUERY OPTIMIZATION

RUNSTATS ON TABLE DB2DBA.SV_TIME_FACT WITH DISTRIBUTION
Once the view is enabled for optimization, it is identified as a statistical view in the SYSCAT.TABLES catalog view with a 'Y' in position 13 of the PROPERTY column.
This statistical view can be used to improve the cardinality estimate and, consequently, the access plan and query performance for queries such as the following query:
SELECT SUM(S.PRICE)
  FROM SALES S, TIME T, PRODUCT P
  WHERE
    T.TIME_KEY = S.TIME_KEY AND
    T.YEAR_MON = 200712 AND
    P.PROD_KEY = S.PROD_KEY AND
    P.PROD_DESC = ‘Power drill'
Without a statistical view, the optimizer assumes that all fact table TIME_KEY values corresponding to a particular TIME dimension YEAR_MON value occur uniformly within the fact table. However, sales might have been strong in December, resulting in many more sales transactions than during other months.

Statistics that are gathered on queries that have complex expressions in the predicate can be used by the optimizer to calculate accurate cardinality estimates which results in better access plans.

For many star join queries many statistical views might need to be created, however, if you have referential integrity constraints you can narrow down these many statistical views. The statistics for the other views can be inferred from the reduced number of statistical views by using referential integrity constraints.

Another way to obtain better access plans is to apply column group statistics on statistical views. These group statistics help to adjust filter factors which help to gather more accurate statistics which the optimizer can use to obtain accurate cardinality estimates.

Statistics can also be gathered automatically from statistical views through the automatic statistics collection feature in DB2 for Linux, UNIX, and Windows. This new feature can be enabled or disabled by using the auto_stats_views database configuration parameter. This database configuration parameter is off by default and can be enabled by using the UPDATE DB CFG command. The statistics collected by the automatic statistics collection is equivalent to issuing the command runstats on view view_name with distribution.

Utility throttling can be used for statistical views to restrict the performance impact on the workload. For example, the command runstats on view view_name util_impact_priority 10 contains the impact on the workload within a limit specified by the util_impact_lim database manager configuration parameter while statistics are collected on statistical views.

A statistical view cannot directly or indirectly reference a catalog table.