DB2 Version 9.7 for Linux, UNIX, and Windows

Optimization strategies for partitioned tables

Data partition elimination refers to the database server's ability to determine, based on query predicates, that only a subset of the data partitions in a table need to be accessed to answer a query. Data partition elimination is particularly useful when running decision support queries against a partitioned table.

A partitioned table uses a data organization scheme in which table data is divided across multiple storage objects, called data partitions or ranges, according to values in one or more table partitioning key columns of the table. Data from a table is partitioned into multiple storage objects based on specifications provided in the PARTITION BY clause of the CREATE TABLE statement. These storage objects can be in different table spaces, in the same table space, or a combination of both.

The following example demonstrates the performance benefits of data partition elimination.

   create table custlist(
     subsdate date, province char(2), accountid int)
     partition by range(subsdate) (
       starting from '1/1/1990' in ts1,
       starting from '1/1/1991' in ts1,
       starting from '1/1/1992' in ts1,
       starting from '1/1/1993' in ts2,
       starting from '1/1/1994' in ts2,
       starting from '1/1/1995' in ts2,
       starting from '1/1/1996' in ts3,
       starting from '1/1/1997' in ts3,
       starting from '1/1/1998' in ts3,
       starting from '1/1/1999' in ts4,
       starting from '1/1/2000' in ts4,
       starting from '1/1/2001'
       ending '12/31/2001' in ts4)

Assume that you are only interested in customer information for the year 2000.

   select * from custlist
     where subsdate between '1/1/2000' and '12/31/2000'

As Figure 1 shows, the database server determines that only one data partition in table space TS4 must be accessed to resolve this query.

Figure 1. The performance benefits of data partition elimination

The database server determines that only a subset of data partitions must be accessed to answer the query.

Another example of data partition elimination is based on the following scheme:

   create table multi (
     sale_date date, region char(2))
     partition by (sale_date) (
       starting '01/01/2005'
       ending '12/31/2005'
       every 1 month)

   create index sx on multi(sale_date)

   create index rx on multi(region)

Assume that you issue the following query:

   select * from multi
     where sale_date between '6/1/2005'
       and '7/31/2005' and region = 'NW'

Without table partitioning, one likely plan is index ANDing. Index ANDing performs the following tasks:

Reads all relevant index entries from each index
Saves both sets of row identifiers (RIDs)
Matches RIDs to determine which occur in both indexes
Uses the RIDs to fetch the rows

As Figure 2 demonstrates, with table partitioning, the index is read to find matches for both REGION and SALE_DATE, resulting in the fast retrieval of matching rows.

Figure 2. Optimizer decision path for both table partitioning and index ANDing

Comparison of optimizer decision path with and without table partitioning.

DB2 Explain

You can also use the explain facility to determine the data partition elimination plan that was chosen by the query optimizer. The "DP Elim Predicates" information shows which data partitions are scanned to resolve the following query:

   select * from custlist
     where subsdate between '12/31/1999' and '1/1/2001'

Arguments:
---------
DPESTFLG: (Number of data partitions accessed are Estimated)
 		 FALSE
DPLSTPRT: (List of data partitions accessed)
 		 9-11
DPNUMPRT: (Number of data partitions accessed)
		 3

DP Elim Predicates:
------------------
Range 1)
 	Stop  Predicate: (Q1.A <= '01/01/2001')
	Start Predicate: ('12/31/1999' <= Q1.A)



Objects Used in Access Plan:
 ---------------------------

 	Schema: MRSRINI 
 	Name: 		 CUSTLIST
 	Type: 		 Data Partitioned Table
		 Time of creation: 		 	 2005-11-30-14.21.33.857039
		 Last statistics update: 		 2005-11-30-14.21.34.339392
 		 Number of columns: 		 	 3
 		 Number of rows: 		 	 100000
		 Width of rows: 		 	 19
		 Number of buffer pool pages: 		 1200
 		 Number of data partitions: 		 12
 		 Distinct row values: 		 	 No
 		 Tablespace name: 		 	 <VARIOUS>

Multi-column support

Data partition elimination works in cases where multiple columns are used as the table partitioning key. For example:

   create table sales (
     year int, month int)
	   partition by range(year, month) (
       starting from (2001,1)
       ending at (2001,3) in ts1,
	     ending at (2001,6) in ts2, 
	     ending at (2001,9) in ts3, 
	     ending at (2001,12) in ts4,
	     ending at (2002,3) in ts5, 
	     ending at (2002,6) in ts6, 
	     ending at (2002,9) in ts7,
	     ending at (2002,12) in ts8)

   select * from sales where year = 2001 and month < 8

The query optimizer deduces that only data partitions in TS1, TS2, and TS3 must be accessed to resolve this query.

Note: In the case where multiple columns make up the table partitioning key, data partition elimination is only possible when you have predicates on the leading columns of the composite key, because the non-leading columns that are used for the table partitioning key are not independent.

Multi-range support

It is possible to obtain data partition elimination with data partitions that have multiple ranges (that is, those that are ORed together). Using the SALES table that was created in the previous example, execute the following query:

   select * from sales
     where (year = 2001 and month <= 3)
       or (year = 2002 and month >= 10)

The database server only accesses data for the first quarter of 2001 and the last quarter of 2002.

Generated columns

You can use generated columns as table partitioning keys. For example:

   create table sales (
     a int, b int generated always as (a / 5))
     in ts1,ts2,ts3,ts4,ts5,ts6,ts7,ts8,ts9,ts10
     partition by range(b) (
       starting from (0)
       ending at (1000) every (50))

In this case, predicates on the generated column are used for data partition elimination. In addition, when the expression that is used to generate the columns is monotonic, the database server translates predicates on the source columns into predicates on the generated columns, which enables data partition elimination on the generated columns. For example:

   select * from sales where a > 35

The database server generates an extra predicate on b (b > 7) from a (a > 35), thus allowing data partition elimination.

Join predicates

Join predicates can also be used in data partition elimination, if the join predicate is pushed down to the table access level. The join predicate is only pushed down to the table access level on the inner join of a nested loop join (NLJN).

Consider the following tables:

   create table t1 (a int, b int)
     partition by range(a,b) (
       starting from (1,1)
       ending (1,10) in ts1,
       ending (1,20) in ts2,
       ending (2,10) in ts3,
       ending (2,20) in ts4,
       ending (3,10) in ts5,
       ending (3,20) in ts6,
       ending (4,10) in ts7,
       ending (4,20) in ts8)

   create table t2 (a int, b int)

The following two predicates will be used:

   P1: T1.A = T2.A
   P2: T1.B > 15

In this example, the exact data partitions that will be accessed at compile time cannot be determined, due to unknown outer values of the join. In this case, as well as cases where host variables or parameter markers are used, data partition elimination occurs at run time when the necessary values are bound.

During run time, when T1 is the inner of an NLJN, data partition elimination occurs dynamically, based on the predicates, for every outer value of T2.A. During run time, the predicates T1.A = 3 and T1.B > 15 are applied for the outer value T2.A = 3, which qualifies the data partitions in table space TS6 to be accessed.

Suppose that column A in tables T1 and T2 have the following values:

Outer table T2: column A	Inner table T1: column A	Inner table T1: column B	Inner table T1: data partition location
2	3	20	TS6
3	2	10	TS3
3	2	18	TS4
	3	15	TS6
	1	40	TS3

To perform a nested loop join (assuming a table scan for the inner table), the database manager performs the following steps:

Reads the first row from T2. The value for A is 2.
Binds the T2.A value (which is 2) to the column T2.A in the join predicate T1.A = T2.A. The predicate becomes T1.A = 2.
Applies data partition elimination using the predicates T1.A = 2 and T1.B > 15. This qualifies data partitions in table space TS4.
After applying T1.A = 2 and T1.B > 15, scans the data partitions in table space TS4 of table T1 until a row is found. The first qualifying row found is row 3 of T1.
Joins the matching row.
Scans the data partitions in table space TS4 of table T1 until the next match (T1.A = 2 and T1.B > 15) is found. No more rows are found.
Repeats steps 1 through 6 for the next row of T2 (replacing the value of A with 3) until all the rows of T2 have been processed.

Indexes over XML data

Starting in DB2® Version 9.7 Fix Pack 1, you can create an index over XML data on a partitioned table as either partitioned or nonpartitioned. The default is a partitioned index.

Partitioned and nonpartitioned XML indexes are maintained by the database manager during table insert, update, and delete operations in the same way as any other relational indexes on a partitioned table are maintained. Nonpartitioned indexes over XML data on a partitioned table are used in the same way as indexes over XML data on a nonpartitioned table to speed up query processing. Using the query predicate, it is possible to determine that only a subset of the data partitions in the partitioned table need to be accessed to answer the query.

Data partition elimination and indexes over XML columns can work together to enhance query performance. Consider the following partitioned table:

   create table employee (a int, b xml, c xml)
     index in tbspx
     partition by (a) (
       starting 0 ending 10,
       ending 20,
       ending 30,
       ending 40)

Now consider the following query:

   select * from employee
     where a > 21
     and xmlexist('$doc/Person/Name/First[.="Eric"]'
       passing "EMPLOYEE"."B" as "doc")

The optimizer can immediately eliminate the first two partitions based on the predicate a > 21. If the nonpartitioned index over XML data on column B is chosen by the optimizer in the query plan, an index scan using the index over XML data will be able to take advantage of the data partition elimination result from the optimizer and only return results belonging to partitions that were not eliminated by the relational data partition elimination predicates.