Correlated subqueries

A correlated subquery is a subquery that DB2® reevaluates when it examines a new row (in a WHERE clause) or a group of rows (in a HAVING clause) as it executes the outer SELECT statement.

In an uncorrelated subquery, DB2 executes the subquery once, substitutes the result of the subquery in the right side of the search condition, and evaluates the outer SELECT based on the value of the search condition.

User-defined functions in correlated subqueries

Use care when you invoke a user-defined function in a correlated subquery, and that user-defined function uses a scratchpad. DB2 does not refresh the scratchpad between invocations of the subquery. This can cause undesirable results because the scratchpad keeps values across the invocations of the subquery.

An example of a correlated subquery

Suppose that you want a list of all the employees whose education levels are higher than the average education levels in their respective departments. To get this information, DB2 must search the DSN8A10.EMP table. For each employee in the table, DB2 needs to compare the employee's education level to the average education level for that employee's department.

For this example, you need to use a correlated subquery, which differs from an uncorrelated subquery. An uncorrelated subquery compares the employee's education level to the average of the entire company, which requires looking at the entire table. A correlated subquery evaluates only the department that corresponds to the particular employee.

In the subquery, you tell DB2 to compute the average education level for the department number in the current row. The following query performs this action:

SELECT EMPNO, LASTNAME, WORKDEPT, EDLEVEL
  FROM DSN8A10.EMP X
  WHERE EDLEVEL >
    (SELECT AVG(EDLEVEL)
       FROM DSN8A10.EMP
       WHERE WORKDEPT = X.WORKDEPT);

A correlated subquery looks like an uncorrelated one, except for the presence of one or more correlated references. In the example, the single correlated reference is the occurrence of X.WORKDEPT in the WHERE clause of the subselect. In this clause, the qualifier X is the correlation name that is defined in the FROM clause of the outer SELECT statement. X designates rows of the first instance of DSN8A10.EMP. At any time during the execution of the query, X designates the row of DSN8A10.EMP to which the WHERE clause is being applied.

Consider what happens when the subquery executes for a given row of DSN8A10.EMP. Before it executes, X.WORKDEPT receives the value of the WORKDEPT column for that row. Suppose, for example, that the row is for Christine Haas. Her work department is A00, which is the value of WORKDEPT for that row. Therefore, the following is the subquery that is executed for that row:

(SELECT AVG(EDLEVEL)
   FROM DSN8A10.EMP
   WHERE WORKDEPT = 'A00');

The subquery produces the average education level of Christine's department. The outer SELECT then compares this average to Christine's own education level. For some other row for which WORKDEPT has a different value, that value appears in the subquery in place of A00. For example, in the row for Michael L Thompson, this value is B01, and the subquery for his row delivers the average education level for department B01.

The result table that is produced by the query is similar to the following output:

EMPNO    LASTNAME    WORKDEPT  EDLEVEL
======   =========   ========  =======
000010   HASS        A00       18
000030   KWAN        C01       20
000070   PULASKI     D21       16
000090   HENDERSON   E11       16