Hierarchical Clause

The Hierarchical clause sets the conditions for recursive queries on a table object in which a hierarchy of parent-child dependencies exists among the rows. SELECT statements that include this clause are called hierarchical queries.

The table object on which the hierarchical query operates must be specified in the FROM clause of the SELECT statement. The table object is typically a self-referencing table in which one or more columns acts as a foreign key constraint for another column (or for a subset of the columns) in the same table.

A hierarchical query operates on rows in which one or more column values correspond to nodes within a logical structure of parent-child relationships. If parent rows have multiple children, sibling relationships exist among child rows of the same parent. These relationships might reflect, for example, the reporting structure among employees and managers within the divisions and management levels of an organization.

The syntax that this clause supports is an extension to the ANSI/ISO standard for SQL. This syntax fragment is part of the SELECT statement.

Read syntax diagramSkip visual syntax diagram
Hierarchical Clause

                                                         (2)   
|--+---------------------------+--|  CONNECT BY Clause |--------|
   |                       (1) |                               
   '-| START WITH Clause |-----'                               

Notes:
  1. See START WITH Clause
  2. See CONNECT BY Clause
The table object on which the hierarchical query operates must be specified in the FROM clause of the SELECT statement. The table object can be any of the following table objects:
  • A table or updateable view
  • A temporary table
  • A table in another database of the same Informix instance to which the session is connected
  • A derived table that is the result of a query
  • A table that is protected by a label-based access control (LBAC) security policy
  • A table with column level encryption or row level encryption
  • A synonym for any of the other table objects.
The following table objects are not supported in the FROM clause of a hierarchical query:
  • A join of two or more tables
  • A view that is not updatable
  • A table in a database of a remote Informix instance
  • An external table that the CREATE EXTERNAL TABLE statement defined
  • A sequence object.

Informix supports sequence objects in the projection list of hierarchical queries, in the WHERE clause, and in other contexts where an expression is valid in SELECT statements, but not in the hierarchical query clause.

The hierarchical clause is valid in correlated subqueries and in uncorrelated subqueries.

Hierarchical queries can include all types of optimizer directives, with these exceptions:
  • Join-order directives
  • Join-method directives

Hierarchical queries do not support the Parallel Database Query (PDQ) feature of Informix.

The Hierarchical clause can specify recursive queries on a table whose rows describe a hierarchy of parent-child relationships.
  • The hierarchy can be a simple hierarchy, such as the reporting structure of an organization in which every node that is not the root reports to a single node at higher level within the hierarchy. (In the LBAC security feature of Informix, a security label component of type TREE has the logical structure of a simple hierarchy.)
  • The Hierarchical clause can query data hierarchies of more complex topologies, in which nodes have many-to-many relationships, and in which a child node can be the ancestor of its parent. For information about using the Hierarchical clause to query a table that has cycles within the data hierarchy, see CONNECT BY Clause.
Important: Hierarchical queries are most efficient for data sets in which parent-child dependencies in the table have the logical topology of a simple graph. If the self-referencing table includes more than one independent hierarchy for the same set of columns, or if any child row is also an ancestor of its parent, see Dependency patterns that are not a simple graph.
Note: The Hierarchical clause is unrelated to table hierarchies, in which a hierarchy of parent-child relationships exist among the schemas of a set of typed tables. Similarly, the hierarchy of a set of DISTINCT data types that all derive from a common base type resembles a data hierarchy, but is unrelated to the Hierarchical clause, where the hierarchy exists in parent-child dependencies between data entities, rather than relationships among data types.

SQL Syntax Specific to Hierarchical Queries

Besides the START WITH, CONNECT BY, and CONNECT NOCYCLE BY keywords that specify the conditions for recursive queries of a table that contains hierarchical data, hierarchical queries also support syntax tokens that are valid only in hierarchical queries, and that cannot be used in SELECT statements that have no CONNECT BY clause. Syntax tokens specific to hierarchical queries include two operators, three pseudocolumns, and a built-in function:
  • CONNECT_BY_ROOT operator

    This operator can return an expression for the root ancestor of its operand.

  • PRIOR operator

    This operator can reference a returned value from the previous recursive step (where "step" refers to an iteration of the recursive query).

  • LEVEL pseudocolumn

    This pseudocolumn returns an integer, indicating which step of the recursive query returned a row within the hierarchy.

  • CONNECT_BY_ISCYCLE pseudocolumn

    This pseudocolumn can indicate whether a row has a child row that is also its ancestor.

  • CONNECT_BY_ISLEAF pseudocolumn

    This pseudocolumn can indicate whether a row has any children among the rows that the query returns.

  • SYS_CONNECT_BY_PATH function

    This function can construct and return a string that represents the path from a specified row to the root of the hierarchy

  • SIBLINGS keyword in the ORDER BY clause

    The ORDER SIBLINGS BY clause can sort returned rows for siblings of the same parent at every level.

A pseudocolumn is a built-in identifier that the SQL parser can recognize in specific contexts, and that shares the same namespace as columns and variables. These pseudocolumns and the SYS_CONNECT_BY_PATH function are typically specified in the Projection clause of the SELECT statement, but the LEVEL pseudocolumn and the PRIOR operator can be specified in the Hierarchical clause.

For details of the syntax and semantics of these tokens that support only hierarchical queries, see Conditions in the CONNECT BY Clause and ORDER SIBLINGS BY Clause.

Overview of Hierarchical Queries

The clauses of a SELECT statement that includes the Hierarchical clause are processed in the following sequence:
  1. FROM clause (for only a single table object in the current database)
  2. Hierarchical clause
  3. WHERE clause (without join predicates)
  4. GROUP BY clause
  5. HAVING clause
  6. Projection clause
  7. ORDER BY clause

The ORDER BY SIBLING option of the ORDER BY clause can order the set of rows that are children of the same parent.

A subquery that includes the Hierarchical clause returns the intermediate result set in a partial order, where the rows produced in iteration (n+1) for a specific hierarchy immediately follow the row in iteration (n) that produced them. However, specifying an ORDER BY clause, a GROUP BY or HAVING clause, or the DISTINCT or UNIQUE keyword in the Projection clause destroys that partial order.

The Hierarchical clause follows the WHERE clause in the lexical sequence of SELECT statement clauses, but the WHERE clause predicates are processed on the result of the Hierarchical clause. The WHERE clause cannot specify join predicates if the SELECT statement includes the Hierarchical clause, but the table object that is specified in the FROM clause can be the result set of a query that joins one or more tables.

Any SELECT statement that includes a hierarchical-query-clause is called a hierarchical query, which performs a recursive series of queries on the table that the FROM clause specifies:
  1. The optional START WITH clause can specify a condition. Any rows that satisfy this condition are returned as the first intermediate result set of the hierarchical query.
  2. The next step applies the condition that is specified in the CONNECT BY clause to the table. Any rows that satisfy that condition are returned as the second intermediate result set.
  3. The next step applies the CONNECT BY condition to the table. Any rows that are returned comprise the third intermediate result set.
  4. The CONNECT BY clause runs queries recursively to produce successive intermediate result sets, until an iteration yields an empty result set.
  5. The hierarchical SELECT statement then combines all of the intermediate result sets of the preceding recursive steps, producing the final result set of the Hierarchical clause.
  6. The predicates of the WHERE clause are then applied to this set of rows that the Hierarchical clause retrieved, and the remaining clauses of the SELECT statement are then applied in the order listed.

After the START WITH and CONNECT BY clauses return all of the intermediate result sets, you can use the ORDER SIBLINGS BY clause to sort the sibling rows that have the same parent for every level within the hierarchy. For more information, see ORDER SIBLINGS BY Clause.

You can use output from the SET EXPLAIN statement to view the execution path of a hierarchical query.

The Hierarchical clause provides an efficient alternative to using the Node database extension to retrieve information from hierarchical data sets

Example of a Hierarchical Data Set

In several topics that follow, SQL code examples that illustrate hierarchical queries are based on hierarchic data in the following employee table, whose rows contains information about employees within an organizational hierarchy. The mgrid column shows the employee identifier (empid) of the manager to whom the employee reports:

CREATE TABLE employee(
                      empid  INTEGER NOT NULL PRIMARY KEY,
                      name   VARCHAR(10),
                      salary DECIMAL(9, 2),
                      mgrid  INTEGER
);

Data values for the 17 rows in the employee table are these.

INSERT INTO employee VALUES  ( 1, 'Jones',    30000, 10);
INSERT INTO employee VALUES  ( 2, 'Hall',     35000, 10);
INSERT INTO employee VALUES  ( 3, 'Kim',      40000, 10);
INSERT INTO employee VALUES  ( 4, 'Lindsay',  38000, 10);
INSERT INTO employee VALUES  ( 5, 'McKeough', 42000, 11);
INSERT INTO employee VALUES  ( 6, 'Barnes',   41000, 11);
INSERT INTO employee VALUES  ( 7, 'O''Neil',  36000, 12);
INSERT INTO employee VALUES  ( 8, 'Smith',    34000, 12);
INSERT INTO employee VALUES  ( 9, 'Shoeman',  33000, 12);
INSERT INTO employee VALUES  (10, 'Monroe',   50000, 15);
INSERT INTO employee VALUES  (11, 'Zander',   52000, 16);
INSERT INTO employee VALUES  (12, 'Henry',    51000, 16);
INSERT INTO employee VALUES  (13, 'Aaron',    54000, 15);
INSERT INTO employee VALUES  (14, 'Scott',    53000, 16);
INSERT INTO employee VALUES  (15, 'Mills',    70000, 17);
INSERT INTO employee VALUES  (16, 'Goyal',    80000, 17);
INSERT INTO employee VALUES  (17, 'Urbassek', 95000, NULL);

Each pair of empid and mgrid values express referential relationships that the recursive iterations of a query with an appropriate CONNECT BY condition can correctly assemble into a hierarchy.

Here the NULL value in the mgrid column in the last row shows that employee Urbassek, whose empid value is 17 is the root node of this reporting hierarchy.

The following diagram illustrates the four levels of the reporting hierarchy (with nodes that show the empid values) for the employee table data:

Figure 1. Relationships of Elements in a Reporting Hierarchy
START ALTEXT: Diagram illustrates the hierarchy of the employees (represented by nodes that show their numeric employee identifiers) and the manager to whom each employee (except the root) reports. END ALTTEXT