Hierarchical queries are a form of recursive query that provides support for retrieving a hierarchy, such as a bill of materials, from relational data using a CONNECT BY clause.
Hierarchical query support is enabled through the setting of the DB2_COMPATIBILITY_VECTOR registry variable. This allows the CONNECT BY syntax to be specified, including the use of pseudocolumns (see Pseudocolumns), unary operators (see Unary operators), and the SYS_CONNECT_BY_PATH scalar function.
Connect-by recursion uses the same subquery for the seed (start) and the recursive step (connect). This combination provides a concise method of representing recursions such as, for example, bills-of-material, reports-to-chains, or email threads.
Connect-by recursion returns an error if a cycle occurs. A cycle occurs when a row produces itself, either directly or indirectly. Using the optional CONNECT BY NOCYCLE clause, the recursion can be directed to ignore the duplicated row, thus avoiding both the cycle and the error. For more information about mapping hierarchical queries to DB2® recursion, see Port CONNECT BY to DB2.
A pseudocolumn is a qualified or unqualified identifier that has meaning in a specific context and shares the same namespace as columns and variables. If an unqualified identifier does not identify a column or a variable, it is checked to see if it identifies a pseudocolumn.
LEVEL is a pseudocolumn for use in hierarchical queries. The LEVEL pseudocolumn returns the recursive step in the hierarchy at which a row was produced. All rows produced by the START WITH clause return the value 1. Rows produced by applying the first iteration of the CONNECT BY clause return 2, and so on. The data type of the column is INTEGER NOT NULL.
LEVEL must be specified in the context of a hierarchical query but cannot be specified in the START WITH clause, as an argument of the CONNECT_BY_ROOT operator, or as an argument of the SYS_CONNECT_BY_PATH function (SQLSTATE 428H4).
A hierarchical query involving joins should be written using explicit joined tables with an ON clause to avoid confusion about the application of WHERE clause predicates.
>>-●--+-----------------------+--●--| connect-by-clause |--●--->< '-| start-with-clause |-' start-with-clause |--START WITH--search-condition---------------------------------| connect-by-clause |--CONNECT BY--+---------+--search-condition--------------------| '-NOCYCLE-'
A subselect that includes a hierarchical-query-clause is called a hierarchical query. After establishing a first intermediate result table H1, subsequent intermediate result tables H2, H3, and so forth are generated by joining Hn with R using the connect-by-clause as a join condition to produce Hn+1. R is the result of the FROM clause of the subselect and any join predicates in the WHERE clause. The process stops when Hn+1 has yielded an empty result table. The result table H of the hierarchical-query-clause is the UNION ALL of every Hi.
The start-with-clause specifies the intermediate result table H1 for the hierarchical query that consists of those rows of R for which the search-condition is true. If the start-with-clause is not specified, H1 is the entire intermediate result table R.
The connect-by-clause produces the intermediate result table Hn+1 from Hn by joining Hn with R, using the search condition.
CONNECT BY MGRID = PRIOR EMPID
MGRID
is resolved with R, and EMPID is resolved within the previous intermediate
result table Hn.The rules for the search-condition within the start-with-clause and the connect-by-clause are the same as those within the WHERE clause, except that OLAP specifications cannot be specified in the connect-by-clause (SQLSTATE 42903).
If the intermediate result table Hn+1 would return a row from R for a hierarchical path that is the same as a row from R that is already in that hierarchical path, an error is returned (SQLSTATE 560CO).
If the NOCYCLE keyword is specified, an error is not returned, but the repeated row is not included in the intermediate result table Hn+1.
DB2 supports a maximum of 64 levels of recursion (SQLSTATE 54066).
A subselect that is a hierarchical query returns the intermediate result set in a partial order, unless that order is destroyed through the use of an explicit ORDER BY clause, a GROUP BY or HAVING clause, or a DISTINCT keyword in the select list. The partial order returns rows, such that rows produced in Hn+1 for a given hierarchy immediately follow the row in Hn that produced them. The ORDER SIBLINGS BY clause can be used to enforce order within a set of rows produced by the same parent.
CREATE TABLE MY_EMP(
EMPID INTEGER NOT NULL PRIMARY KEY,
NAME VARCHAR(10),
SALARY DECIMAL(9, 2),
MGRID INTEGER);
INSERT INTO MY_EMP VALUES ( 1, 'Jones', 30000, 10);
INSERT INTO MY_EMP VALUES ( 2, 'Hall', 35000, 10);
INSERT INTO MY_EMP VALUES ( 3, 'Kim', 40000, 10);
INSERT INTO MY_EMP VALUES ( 4, 'Lindsay', 38000, 10);
INSERT INTO MY_EMP VALUES ( 5, 'McKeough', 42000, 11);
INSERT INTO MY_EMP VALUES ( 6, 'Barnes', 41000, 11);
INSERT INTO MY_EMP VALUES ( 7, 'O''Neil', 36000, 12);
INSERT INTO MY_EMP VALUES ( 8, 'Smith', 34000, 12);
INSERT INTO MY_EMP VALUES ( 9, 'Shoeman', 33000, 12);
INSERT INTO MY_EMP VALUES (10, 'Monroe', 50000, 15);
INSERT INTO MY_EMP VALUES (11, 'Zander', 52000, 16);
INSERT INTO MY_EMP VALUES (12, 'Henry', 51000, 16);
INSERT INTO MY_EMP VALUES (13, 'Aaron', 54000, 15);
INSERT INTO MY_EMP VALUES (14, 'Scott', 53000, 16);
INSERT INTO MY_EMP VALUES (15, 'Mills', 70000, 17);
INSERT INTO MY_EMP VALUES (16, 'Goyal', 80000, 17);
INSERT INTO MY_EMP VALUES (17, 'Urbassek', 95000, NULL);
1 SELECT NAME,
2 LEVEL,
3 SALARY,
4 CONNECT_BY_ROOT NAME AS ROOT,
5 SUBSTR(SYS_CONNECT_BY_PATH(NAME, ':'), 1, 25) AS CHAIN
6 FROM MY_EMP
7 START WITH NAME = 'Goyal'
8 CONNECT BY PRIOR EMPID = MGRID
9 ORDER SIBLINGS BY SALARY;
NAME LEVEL SALARY ROOT CHAIN
---------- ----------- ----------- ----- ---------------
Goyal 1 80000.00 Goyal :Goyal
Henry 2 51000.00 Goyal :Goyal:Henry
Shoeman 3 33000.00 Goyal :Goyal:Henry:Shoeman
Smith 3 34000.00 Goyal :Goyal:Henry:Smith
O'Neil 3 36000.00 Goyal :Goyal:Henry:O'Neil
Zander 2 52000.00 Goyal :Goyal:Zander
Barnes 3 41000.00 Goyal :Goyal:Zander:Barnes
McKeough 3 42000.00 Goyal :Goyal:Zander:McKeough
Scott 2 53000.00 Goyal :Goyal:Scott
Lines 7 and 8 comprise the core of the recursion: The optional START WITH clause describes the WHERE clause that is to be used on the source table to seed the recursion. In this case, only the row for employee Goyal is selected. If the START WITH clause is omitted, the entire source table is used to seed the recursion. The CONNECT BY clause describes how, given the existing rows, the next set of rows is to be found. The unary operator PRIOR is used to distinguish values in the previous step from those in the current step. PRIOR identifies EMPID as the employee ID of the previous recursive step, and MGRID as originating from the current recursive step.
LEVEL in line 2 is a pseudocolumn that describes the current level of recursion.
CONNECT_BY_ROOT is a unary operator that always returns the value of its argument as it was during the first recursive step; that is, the values that are returned by an explicit or implicit START WITH clause.
SYS_CONNECT_BY_PATH() is a binary function that prepends the second argument to the first and then appends the result to the value that it produced in the previous recursive step. The arguments must be character types.
Unless explicitly overridden, connect-by recursion returns a result set in a partial order; that is, the rows that are produced by a recursive step always follow the row that produced them. Siblings at the same level of recursion have no specific order. The ORDER SIBLINGS BY clause in line 9 defines an order for these siblings, which further refines the partial order, potentially into a total order.
SELECT LEVEL, CAST(SPACE((LEVEL - 1) * 4) || '/' || DEPTNAME
AS VARCHAR(40)) AS DEPTNAME
FROM DEPARTMENT
START WITH DEPTNO = 'A00'
CONNECT BY NOCYCLE PRIOR DEPTNO = ADMRDEPT
The
query returns:LEVEL DEPTNAME
----------- ----------------------------------------
1 /SPIFFY COMPUTER SERVICE DIV.
2 /PLANNING
2 /INFORMATION CENTER
2 /DEVELOPMENT CENTER
3 /MANUFACTURING SYSTEMS
3 /ADMINISTRATION SYSTEMS
2 /SUPPORT SERVICES
3 /OPERATIONS
3 /SOFTWARE SUPPORT
3 /BRANCH OFFICE F2
3 /BRANCH OFFICE G2
3 /BRANCH OFFICE H2
3 /BRANCH OFFICE I2
3 /BRANCH OFFICE J2