Implementation and optimization of join operations with DB2® Multisystem

In addition to the performance considerations for nondistributed join queries, more performance considerations exist for queries involving distributed files.

Joins can be performed only when the data is partition compatible. The distributed query optimizer generates a plan that makes data partition compatible, which might involve moving data between nodes.

Data is partition compatible when the data in the partitioning keys of both files uses the same node group and hashes to the same node. For example, the same numeric value stored in either a large-integer field or a small-integer field hashes to the same value.

The data types that follow are partition compatible:

  • Large integer (4-byte), small integer (2-byte), packed decimal, and zoned numeric.
  • Fixed-length and varying-length SBCS character and DBCS-open, -either, or -only.
  • Fixed-length and varying-length graphic.

Date, time, timestamp, and floating-point numeric data types are not partition compatible because they cannot be partitioning keys.

Joins involving distributed files are classified into four types: collocated, directed, repartitioned, and broadcast. The following sections define the types of joins and give examples of the different join types.