DB2 Version 9.7 for Linux, UNIX, and Windows

Improving query optimization with materialized query tables

Materialized query tables (MQTs) are a powerful way to improve response time for complex queries.

This is especially true for queries that might require one or more of the following operations:

Aggregate data over one or more dimensions
Joins and aggregate data over a group of tables
Data from a commonly accessed subset of data; that is, from a "hot" horizontal or vertical database partition
Re-partitioned data from a table, or part of a table, in a partitioned database environment

Knowledge of MQTs is integrated into the SQL and XQuery compiler. In the compiler, the query rewrite phase and the optimizer match queries with MQTs and determine whether to substitute an MQT in a query that accesses the base tables. If an MQT is used, the explain facility can provide information about which MQT was selected. In this case, users need access privileges against the base tables, not rerouted MQTs.

Because MQTs behave like regular tables in many ways, the same guidelines for optimizing data access using table space definitions and indexes, and by invoking the runstats utility, apply to MQTs.

To help you to understand the power of MQTs, the following example shows how a multidimensional analysis query can take advantage of MQTs. Consider a database warehouse that contains a set of customers and a set of credit card accounts. The warehouse records the set of transactions that are made with the credit cards. Each transaction contains a set of items that are purchased together. This schema is classified as a multi-star schema, because it has two large tables, one containing transaction items, and the other identifying the purchase transactions.

Three hierarchical dimensions describe a transaction: product, location, and time. The product hierarchy is stored in two normalized tables representing the product group and the product line. The location hierarchy contains city, state, and country, region, or territory information, and is stored in a single denormalized table. The time hierarchy contains day, month, and year information, and is encoded in a single date field. The date dimensions are extracted from the date field of the transaction using built-in functions. Other tables in this schema represent account information for customers, and customer information.

An MQT is created for sales at each level of the following hierarchies:

Product
Location
Time, composed of year, month, and day

Many queries can be satisfied by this stored aggregate data. The following example shows how to create an MQT that computes the sum and count of sales data along the product group and line dimensions; along the city, state, and country, region, or territory dimensions; and along the time dimension. It also includes several other columns in its GROUP BY clause.

   create table dba.pg_salessum
     as (
       select l.id as prodline, pg.id as pgroup,
         loc.country, loc.state, loc.city,
         l.name as linename, pg.name as pgname,
         year(pdate) as year, month(pdate) as month,
         t.status,
         sum(ti.amount) as amount,
         count(*) as count
       from cube.transitem as ti, cube.trans as t,
         cube.loc as loc, cube.pgroup as pg, cube.prodline as l
       where
         ti.transid = t.id and
         ti.pgid = pg.id and
         pg.lineid = l.id and
         t.locid = loc.id and
         year(pdate) > 1990
       group by l.id, pg.id, loc.country, loc.state, loc.city,
         year(pdate), month(pdate), t.status, l.name, pg.name
     )
   data initially deferred refresh deferred;

   refresh table dba.pg_salessum;

Queries that can take advantage of such precomputed sums include the following:

Sales by month and product group
Total sales for the years after 1990
Sales for 1995 or 1996
The sum of sales for a specific product group or product line
The sum of sales for a specific product group or product line in 1995 and 1996
The sum of sales for a specific country, region, or territory

Although the precise answer for any of these queries is not included in the MQT, the cost of computing the answer by MQT could be significantly less than the cost of using a large base table, because a portion of the answer is already computed. MQTs can reduce the need for expensive joins, sorts, and aggregation of base data.

The following sample queries obtain significant performance improvements because they can use the already computed results in the example MQT.

The first query returns the total sales for 1995 and 1996:

   set current refresh age=any

   select year(pdate) as year, sum(ti.amount) as amount
     from cube.transitem as ti, cube.trans as t,
       cube.loc as loc, cube.pgroup as pg, cube.prodline as l
     where
       ti.transid = t.id and
       ti.pgid = pg.id and
       pg.lineid = l.id and
       t.locid = loc.id and
       year(pdate) in (1995, 1996)
     group by year(pdate);

The second query returns the total sales by product group for 1995 and 1996:

   set current refresh age=any

   select pg.id as "PRODUCT GROUP", sum(ti.amount) as amount
     from cube.transitem as ti, cube.trans as t,
       cube.loc as loc, cube.pgroup as pg, cube.prodline as l
     where
       ti.transid = t.id and
       ti.pgid = pg.id and
       pg.lineid = l.id and
       t.locid = loc.id and
       year(pdate) in (1995, 1996)
     group by pg.id;

The larger the base tables, the more significant are the potential improvements in response time when using MQTs. MQTs can effectively eliminate overlapping work among queries. Computations are performed only once when MQTs are built, and once each time that they are refreshed, and their content can be reused during the execution of many queries.