DB2 Version 9.7 for Linux, UNIX, and Windows

Compiler rewrite example: Operation movement - Predicate pushdown for combined SQL/XQuery statements

One fundamental technique for the optimization of relational SQL queries is to move predicates in the WHERE clause of an enclosing query block into an enclosed lower query block (for example, a view), thereby enabling early data filtering and potentially better index usage.

This is even more important in partitioned database environments, because early filtering potentially reduces the amount of data that must be shipped between database partitions.

Similar techniques can be used to move predicates or XPath filters inside of an XQuery. The basic strategy is always to move filtering expressions as close to the data source as possible. This optimization technique is called predicate pushdown in SQL and extraction pushdown (for filters and XPath extractions) in XQuery.

Because the data models employed by SQL and XQuery are different, you must move predicates, filters, or extractions across the boundary between the two languages. Data mapping and casting rules have to be considered when transforming an SQL predicate into a semantically equivalent filter and pushing it down into the XPath extraction. The following examples address the pushdown of relation predicates into XQuery query blocks.

Consider the following two XML documents containing customer information:
Document 1                   Document 2

<customer>                   <customer>
  <name>John</name>            <name>Michael</name>
  <lastname>Doe</lastname>     <lastname>Miller </lastname>
  <date_of_birth>              <date_of_birth>
	  1976-10-10               	  1975-01-01
  </date_of_birth>             </date_of_birth>
  <address>                    <address>
    <zip>95141.0</zip>           <zip>95142.0</zip>
  </address>                   </address>
<volume>80000.0</volume>     <volume>100000.00</volume>
</customer>                  </customer>
<customer>                   <customer>
  <name>Jane</name>            <name>Michaela</name>
  <lastname>Doe</lastname>     <lastname>Miller</lastname>
  <date_of_birth>              <date_of_birth>
    1975-01-01               	  1980-12-23
  </date_of_birth>             </date_of_birth>
  <address>                    <address>
    <zip>95141.4</zip>           <zip>95140.5</zip>
  </address>                   </address>
<volume>50000.00</volume>    <volume>100000</volume>
</customer>                  </customer>

Example - Pushing integer predicates

Consider the following query:
   select temp.name, temp.zip
     from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
       columns name varchar(20) path 'customer/name',
       zip integer path 'customer/zip'
       ) as temp
     where zip = 95141
To use possible indexes on T.XMLDOC and to filter unwanted persons early on, the zip = 95141 predicate will be internally converted into the following equivalent XPATH filtering expression:
T.XMLCOL/customer/zip[. >= 95141.0 and . < 95142.0] 

Because schema information for XML fragments is not used by the compiler, it cannot be assumed that ZIP contains integers only. It is possible that there are other numeric values with a fractional part and a corresponding double XML index on this specific XPath extraction. The XML2SQL cast would handle this transformation by truncating the fractional part before casting the value to INTEGER. This behavior must be reflected in the pushdown procedure, and the predicate must be changed to remain semantically correct.

Example - Pushing VARCHAR(n) predicates

Consider the following query:
   select temp.name, temp.lastname
     from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
       columns name varchar(20) path 'customer/name',
       lastname varchar(20) path 'customer/lastname'
       ) as temp
     where lastname = 'Miller'
To use possible indexes on T.XMLDOC and to filter unwanted persons early on, the lastname = 'Miller' predicate will be internally converted into the equivalent XPATH filtering expression:
T.XMLCOL/customer/lastname[. > rtrim("Miller") and . < blank_padd("Miller",
  max(20,length("Miller"))]

Trailing blanks are treated differently in SQL than in XPath or XQuery. The original SQL predicate will not distinguish between the two customers whose last name is "Miller", even if one of them (Michael) has a trailing blank. Consequently, both customers are returned, which would not be the case if an unchanged predicate were pushed down.

The solution is to transform the predicate into a range filter.
  • The first boundary is created by truncating all trailing blanks from the comparison value, using the RTRIM() function.
  • The second boundary must be greater than or equal to all possible "Miller" strings that contain trailing blanks. The original string is padded with blanks to the maximum column length, or to the length of the comparison string, if that is longer.

Example - Pushing DECIMAL(x,y) predicates

Consider the following query:
   select temp.name, temp.volume
     from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
       columns name varchar(20) path 'customer/name',
       volume decimal(10,2) path 'customer/volume'
       ) as temp
     where volume = 100000.00
To use possible double indexes on T.XMLDOC and to filter unwanted persons early on, the volume = 100000.00 predicate will be internally converted into the following equivalent XPATH filtering expression:
T.XMLCOL/customer/volume[.=100000.00]

The predicate does not have to be transformed into a range filter, because casting restrictions force XML values to have the same precision and length of the fractional part as the target SQL type. Any violation of this constraint returns an error. Precision is not reduced when DOUBLE values are cast to DECIMAL(x,y). Rounding or truncation of the comparative values is not necessary.