DB2 Version 9.7 for Linux, UNIX, and Windows

Example using XQuery transform in a partitioned database environment

The XQuery transform expression is supported in a partitioned database environment and can be used in scenarios that use extract, transform, and load (ETL) operations.

For example, when XML data is extracted from the result of a join between two source tables, ORDERS and PRODUCT, transformed into a wanted format, and inserted as XML data into a target table SALES, an XQuery transform expression can be used. However, a transform expression performs best if the input to the expression references only a single table rather than a join of several source tables.

When using an XQuery transform expression with multiple tables, writing a single statement in which the transform expression is applied directly to the join result generally yields acceptable performance if the join result is small relative to the size of the source tables. However, if the number of rows produced by the join is comparable to or larger than the number of rows in the source tables, you should consider splitting the join and the transform expression into two statements.

The following example demonstrates a general technique for splitting a single statement that uses a join and an XQuery transform operation into two statements. The two statements ensure that the transform operation can be parallelized in a partitioned database environment. The technique uses the following steps:
  1. Create a new intermediate table in the same partition group having the same distribution key as the target table. The intermediate table can be a declared global temporary table.
  2. Extract the data from the multiple source tables and insert it into the intermediate table.
  3. Transform the XML data from the intermediate table and insert the transformed data into the target table.

Both steps 2 and 3 can take advantage of the parallel processing available in a partitioned database environment to be scalable. A nested transform expression should be avoided in Step 3.

Tables and data used in the example

The example uses the source tables ORDERS and PRODUCTS, the target table SALES, and a declared global temporary table TEMPSALES. The example retrieves data from the ORDER table, joins the data with pricing information from the PRODUCTS table, formats the resulting data, and inserts the formatted data into the SALES table.

The ORDERS table contains the daily orders, with order ID (OID) and order information in an XML column (ORDERDETAIL). Each XML document contains a customer ID (Cid) and the products ordered. The information for each product ordered includes the product ID, quantity, and delivery requirements. The following CREATE statement creates the example ORDERS table:
CREATE TABLE ORDERS(OID BIGINT, ORDERDETAIL XML)DISTRIBUTE BY HASH (OID);
The following INSERT statement inserts a sample order into the ORDERS table:
INSERT into ORDERS
values (5003, '<order>
                <cid>1001</cid>
                <product>
                  <pid>2344</pid><qty>10</qty>
                  <delivery>Overnight</delivery>
                </product>
                <product>
                   <pid>537</pid><qty>3</qty>
                   <delivery>Ground</delivery>
                </product>
               </order>');
The PRODUCTS table contains product information, product ID (PID) product price (PRICE), and details about the product in an XML column (PRODDETAIL). The following CREATE statement creates the example PRODUCTS table:
CREATE TABLE PRODUCTS(PID BIGINT, PRICE FLOAT, PRODDETAIL XML);
The following INSERT statements inserts product data into the PRODUCTS table:
INSERT into PRODUCTS
values(2344, 4.99, '<product>
                      <name>10 D-Cell batteries</name>
                      <desc>D Cell battery, 10-pack</desc>
                    </product>')

INSERT into PRODUCTS
values(537, 8.99, '<product>
                     <name>Ice Scraper, small</name>
                     <desc>Basic ice scraper, 4 inches wide</desc>
                   </product>');
The SALES table contains an order ID (OID), a customer ID (CID), product ID information (PID), the total amount of the order (ITEMTOTAL), and details about each order stored in an XML column (SALEDETAIL). Each row in the SALES table contains information about an individual product ordered. The SALES table is distributed on the order ID (OID) column. The following CREATE statement creates the example SALES table:
CREATE TABLE SALES(OID BIGINT, CID BIGINT, PID BIGINT, ITEMTOTAL FLOAT, 
    SALESDETAIL XML) DISTRIBUTE BY HASH (OID); 

A table join and XQuery transform expression in a single statement

The following INSERT statement joins ORDER and PRODUCT data, applies a transform expression to the resulting XML documents, and inserts the updated documents into the SALES table:
INSERT into SALES
select T.OID, T.CID, T.PID, T.ITEMTOTAL,
   XMLQUERY('
     copy $new := $temp
     modify (do delete ($new/info/cid, 
     $new/info/product/pid, 
     $new/info/product/qty),
       do insert <orderdate>{fn:current-date()}</orderdate> 
       as first into $new/info)
    return $new' passing T.SALESDETAIL as "temp")
from(
SELECT O.OID, OX.CID, OX.PID, P.PRICE * OX.QTY, OX.SALESDETAIL
FROM PRODUCTS P,
     ORDERS O,
     XMLTABLE('for $i in $details/order/product
               return document{<info> {$details/order/cid} {$i} </info>}'
               passing O.ORDERDETAIL as "details"
     columns 
        CID bigint path './info/cid',
        PID bigint path './info/product/pid',
        QTY int    path './info/product/qty',
        SALESDETAIL xml path '.') as OX
WHERE P.PID = OX.PID) as T(OID, CID, PID, ITEMTOTAL, SALESDETAIL);

The next section demonstrates how the table join and the XQuery transform performed in the previous statement can be performed in two separate statements.

A table join and XQuery transform expression in separate statements

If the result of the join in the second half of the statement in A table join and XQuery transform expression in a single statement is significant compared to the size of the ORDER and PRODUCT tables, splitting the statement into two statements improves performance. When splitting the single statement into two, the first statement inserts the results of the join between the ORDER and PRODUCT tables into a temporary table. The second statement applies the transform to the XML documents in the temporary table and inserts the updated documents into the SALES table.

The temporary table is like the SALES table and stores intermediate query results. The declared global temporary table TEMPSALES is used to process the daily orders. In order for the DB2® optimizer to properly optimize the queries used to update the temporary table and the SALES table, the temporary table TEMPSALES and the target table SALES must have the same distribution keys and belong to the same partition group. The following DECLARE statement creates the temporary table TEMPSALES:
DECLARE GLOBAL TEMPORARY TABLE SESSION.TEMPSALES LIKE SALES 
    DISTRIBUTE BY HASH (OID);
The following SELECT statement contains a join between the PRODUCTS and ORDERS tables. The statement calculates the values for the ITEMTOTAL column using information from the PRODUCTS and ORDERS tables, and constructs XML documents that are inserted into the TEMPSALES table. The join is the same as the one in the INSERT statement in A table join and XQuery transform expression in a single statement.
INSERT INTO SESSION.TEMPSALES
SELECT O.OID, OX.CID, OX.PID, P.PRICE * OX.QTY, OX.SALESDETAIL
FROM PRODUCTS P,
     ORDERS O,
     XMLTABLE('for $i in $details/order/product
               return document{<info> {$details/order/cid} {$i} </info>}'
               passing O.ORDERDETAIL as "details"
     columns 
        CID bigint path './info/cid',
        PID bigint path './info/product/pid',
        QTY int    path './info/product/qty',
        SALESDETAIL xml path '.') as OX
WHERE P.PID = OX.PID;

The previous SELECT statement used by the INSERT statement does not contain an XQuery transform expression.

The temporary table contains the relational information needed for the SALES table. In the XML documents in the temporary table, the product information needs to be removed and an order date needs to be added.

The following INSERT statement selects the daily sales information from the temporary table and inserts the information into the table SALES. The INSERT statement contains a SELECT statement that uses the same XQuery transform expression as in the INSERT statement in A table join and XQuery transform expression in a single statement. The expression modifies each XML document from the temporary table before inserting the document into the SALES table. Because the TEMPSALES and SALES tables are located in the same partition group and share a common distribution key, the insert can be parallelized.
INSERT into SALES
select T.OID, T.CID, T.PID, T.ITEMTOTAL,
   XMLQUERY('
     copy $new := $temp
     modify (do delete ($new/info/cid, 
     $new/info/product/pid, 
     $new/info/product/qty),
       do insert <orderdate>{fn:current-date()}</orderdate> 
       as first into $new/info)
    return $new' passing T.SALESDETAIL as "temp")
from SESSION.TEMPSALES T;