Implementation and optimization of the UNION and DISTINCT clauses with DB2® Multisystem

If a unioned SELECT statement refers to a distributed file, the statement is processed as a distributed query.

The processing of the statement can occur in parallel. However, the records from each unioned SELECT are brought back to the coordinator node to perform the union operation. In this regard, the union operators are processed serially.

If an ORDER BY clause is specified with a union query, all of the records from each node are received on the coordinator node and are sorted before any records are returned.

When the DISTINCT clause is specified for a distributed query, adding an ORDER BY clause returns records faster than if no ORDER BY clause was specified. DISTINCT with an ORDER BY allows each node to order the records in parallel. A final merge on the coordinator node reads the ordered records from each node, merges the records in the proper order, and eliminates duplicate records without having to do a final sort.

When the DISTINCT clause is specified without an ORDER BY clause, all of the records from each node are sent to the coordinator node where a sort is performed. Duplicate records are eliminated as the sorted records are returned.