Implementation and optimization of a single file query with DB2® Multisystem

To do a single file query, the system where the query was specified, the coordinator node, determines the nodes of the file to which to send the query. Those nodes run the query and return the queried records to the coordinator node.

All of the examples in this topic use the following distributed files: DEPARTMENT and EMPLOYEE. The node group for these files consists of SYSA, SYSB, and SYSC. The data is partitioned on the department number.

The following SQL statement creates the DEPARTMENT distributed file.

CREATE TABLE DEPARTMENT
       (DEPTNO CHAR(3) NOT NULL,
       DEPTNAME VARCHAR(20) NOT NULL,
       MGRNO CHAR(6),
       ADMRDEPT CHAR(3) NOT NULL)
      IN NODGRP1 PARTITIONING KEY(DEPTNO)

Table 1. DEPARTMENT table
Node	Record number	DEPTNO	DEPTNAME	MGRNO	ADMRDEPT
SYSA	1	A00	Support services	000010	A00
SYSB	2	A01	Planning	000010	A00
SYSC	3	B00	Accounting	000050	B00
SYSA	4	B01	Programming	000050	B00

The following SQL statement creates the EMPLOYEE distributed file.

CREATE TABLE EMPLOYEE
       (EMPNO CHAR(6) NOT NULL,
       FIRSTNME VARCHAR(12) NOT NULL,
       LASTNAME VARCHAR(15) NOT NULL,
       WORKDEPT CHAR(3) NOT NULL,
       JOB CHAR(8),
       SALARY DECIMAL(9,2))
      IN NODGRP1 PARTITIONING KEY(WORKDEPT)

Table 2. EMPLOYEE table
Node	Record number	EMPNO	FIRSTNME	LASTNAME	WORK DEPT	JOB	SALARY
SYSA	1	000010	Christine	Haas	A00	Manager	41250
SYSA	2	000020	Sally	Kwan	A00	Clerk	25000
SYSB	3	000030	John	Geyer	A01	Planner	35150
SYSB	4	000040	Irving	Stern	A01	Clerk	32320
SYSC	5	000050	Michael	Thompson	B00	Manager	38440
SYSC	6	000060	Eileen	Henderson	B00	Accountant	33790
SYSA	7	000070	Jennifer	Lutz	B01	Programmer	42325
SYSA	8	000080	David	White	B01	Programmer	36450

The following query uses the defined distributed file EMPLOYEE, with index EMPIDX created over the field SALARY. The query is entered on SYSA.

SQL statement:

      SELECT * FROM EMPLOYEE WHERE SALARY > 40000

OPNQRYF command:

     OPNQRYF FILE((EMPLOYEE)) QRYSLT('SALARY > 40000')

In this case, SYSA sends the query to all the nodes of EMPLOYEE, including SYSA. Each node runs the query and returns the records to SYSA. Because a distributed index exists on field SALARY of file EMPLOYEE, optimization that is done on each node decides whether to use the index.

In the next example, the query is specified on SYSA, but the query is sent to a subset of the nodes where the EMPLOYEE file exists. In this case, the query is run locally on SYSA only.

SQL statement:

    SELECT * FROM EMPLOYEE WHERE WORKDEPT = 'A00'

OPNQRYF command:

    OPNQRYF FILE((EMPLOYEE)) QRYSLT('WORKDEPT = 'A00')

The distributed query optimizer determines that there is an isolatable record selection, WORKDEPT = 'A00', involving the partitioning key, WORKDEPT, for this query. The optimizer hashes the value 'A00' and based on the hash value, finds the node at which all of the records satisfying this condition are located. In this case, all of the records satisfying this condition are on SYSA, thus the query is sent only to that node. Because the query originated on SYSA, the query is run locally on SYSA.

The following conditions subset the number of nodes at which a query runs:

All fields of the partitioning key must be isolatable record selection
All predicates must use the equal (=) operator
All fields of the partitioning key must be compared to a literal

Note: For performance reasons, you should specify record selection predicates that match the partitioning key in order to direct the query to a particular node. Record selection with scalar functions of NODENAME, PARTITION, and NODENUMBER can also direct the query to specific nodes.