DB2 Version 9.7 for Linux, UNIX, and Windows

Hierarchical queries

Hierarchical queries are a form of recursive query that provides support for retrieving a hierarchy, such as a bill of materials, from relational data using a CONNECT BY clause.

Hierarchical query support is enabled through the setting of the DB2_COMPATIBILITY_VECTOR registry variable. This allows the CONNECT BY syntax to be specified, including the use of pseudocolumns (see Pseudocolumns), unary operators (see Unary operators), and the SYS_CONNECT_BY_PATH scalar function.

Connect-by recursion uses the same subquery for the seed (start) and the recursive step (connect). This combination provides a concise method of representing recursions such as, for example, bills-of-material, reports-to-chains, or email threads.

Connect-by recursion returns an error if a cycle occurs. A cycle occurs when a row produces itself, either directly or indirectly. Using the optional CONNECT BY NOCYCLE clause, the recursion can be directed to ignore the duplicated row, thus avoiding both the cycle and the error. For more information about mapping hierarchical queries to DB2® recursion, see Port CONNECT BY to DB2.

Pseudocolumns

A pseudocolumn is a qualified or unqualified identifier that has meaning in a specific context and shares the same namespace as columns and variables. If an unqualified identifier does not identify a column or a variable, it is checked to see if it identifies a pseudocolumn.

LEVEL is a pseudocolumn for use in hierarchical queries. The LEVEL pseudocolumn returns the recursive step in the hierarchy at which a row was produced. All rows produced by the START WITH clause return the value 1. Rows produced by applying the first iteration of the CONNECT BY clause return 2, and so on. The data type of the column is INTEGER NOT NULL.

LEVEL must be specified in the context of a hierarchical query but cannot be specified in the START WITH clause, as an argument of the CONNECT_BY_ROOT operator, or as an argument of the SYS_CONNECT_BY_PATH function (SQLSTATE 428H4).

Unary operators

Unary operators in support of hierarchical queries include:
  • CONNECT_BY_ROOT
  • PRIOR

Functions

New functions in support of hierarchical queries include:
  • SYS_CONNECT_BY_PATH scalar function

Subselect

Hierarchical query support includes the following extensions to the subselect.
  • The subselect includes a new hierarchical-query-clause
  • The clauses of the subselect are processed in the following sequence:
    1. FROM clause
    2. hierarchical-query-clause
    3. WHERE clause
    4. GROUP BY clause
    5. HAVING clause
    6. SELECT clause
    7. ORDER BY clause
    8. FETCH FIRST clause
  • If the subselect includes a hierarchical-query-clause, special rules apply for the order of processing the predicates in the WHERE clause. The search-condition is factored into predicates along its AND conditions (conjunction). If a predicate is an implicit join predicate (that is, it references more than one table in the FROM clause), the predicate is applied before the hierarchical-query-clause. Any predicate referencing at most one table in the FROM clause is applied to the intermediate result table of the hierarchical-query-clause.

    A hierarchical query involving joins should be written using explicit joined tables with an ON clause to avoid confusion about the application of WHERE clause predicates.

  • The new ORDER SIBLINGS BY clause can be specified if the subselect includes a hierarchical-query-clause. This clause specifies that the ordering applies only to siblings within the hierarchies.

hierarchical-query-clause

Read syntax diagramSkip visual syntax diagram
>>-●--+-----------------------+--●--| connect-by-clause |--●---><
      '-| start-with-clause |-'                                

start-with-clause

|--START WITH--search-condition---------------------------------|

connect-by-clause

|--CONNECT BY--+---------+--search-condition--------------------|
               '-NOCYCLE-'                     

A subselect that includes a hierarchical-query-clause is called a hierarchical query. After establishing a first intermediate result table H1, subsequent intermediate result tables H2, H3, and so forth are generated by joining Hn with R using the connect-by-clause as a join condition to produce Hn+1. R is the result of the FROM clause of the subselect and any join predicates in the WHERE clause. The process stops when Hn+1 has yielded an empty result table. The result table H of the hierarchical-query-clause is the UNION ALL of every Hi.

The start-with-clause specifies the intermediate result table H1 for the hierarchical query that consists of those rows of R for which the search-condition is true. If the start-with-clause is not specified, H1 is the entire intermediate result table R.

The connect-by-clause produces the intermediate result table Hn+1 from Hn by joining Hn with R, using the search condition.

The unary operator PRIOR is used to distinguish column references to Hn, the last prior recursive step, from column references to R. For example:
   CONNECT BY MGRID = PRIOR EMPID
MGRID is resolved with R, and EMPID is resolved within the previous intermediate result table Hn.

The rules for the search-condition within the start-with-clause and the connect-by-clause are the same as those within the WHERE clause, except that OLAP specifications cannot be specified in the connect-by-clause (SQLSTATE 42903).

If the intermediate result table Hn+1 would return a row from R for a hierarchical path that is the same as a row from R that is already in that hierarchical path, an error is returned (SQLSTATE 560CO).

If the NOCYCLE keyword is specified, an error is not returned, but the repeated row is not included in the intermediate result table Hn+1.

DB2 supports a maximum of 64 levels of recursion (SQLSTATE 54066).

A subselect that is a hierarchical query returns the intermediate result set in a partial order, unless that order is destroyed through the use of an explicit ORDER BY clause, a GROUP BY or HAVING clause, or a DISTINCT keyword in the select list. The partial order returns rows, such that rows produced in Hn+1 for a given hierarchy immediately follow the row in Hn that produced them. The ORDER SIBLINGS BY clause can be used to enforce order within a set of rows produced by the same parent.

Restrictions on the use of hierarchical queries

Examples