A number of performance improvements have been made to
improve the speed of many queries. These improvements are automatic.
There are no configuration settings or changes to the SQL statements
required.
Partial early distinct (PED)
An efficient
hashing function will now be used to partially remove duplicates early
in the processing of the query. This may not remove all duplicates,
but will reduce the amount of data that must be processed later in
the query evaluation. Removing some of the initial duplicate rows
will speed up the query, and reduce the chance that it will run out
of sort heap memory, thereby eliminating the need to use relatively
slow disk space for temporary storage in these cases. This improvement
is termed partial early distinct (PED).
To determine if this
improvement is being used for a particular query, activate the Explain
facility and run the query. A new value in the EXPLAIN_ARGUMENT table
indicates when this new functionality has been applied to a query:
- ARGUMENT_TYPE column = UNIQUE
- ARGUMENT_VALUE column can now also have the value: HASHED
PARTIAL which indicates that the new feature has been used
The
db2exfmt tool will also show
HASHED
PARTIAL in its output, as shown in the following example:
6) UNIQUE: (Unique)
Cumulative Total Cost: 132.519
Cumulative CPU Cost: 1.98997e+06
...
...
Arguments:
---------
JN INPUT: (Join input leg)
INNER
UNIQKEY : (Unique Key columns)
1: Q1.C22
UNIQKEY : (Unique Key columns)
2: Q1.C21
pUNIQUE : (Uniqueness required flag)
HASHED PARTIAL
Partial early aggregation (PEA)
Similar
to partial early distinct (PED), partial early aggregation (PEA) is
an attempt to do a partial aggregation of data early in the processing
of the query. While it is unlikely that all aggregation can take place
at this point, it will at least reduce the amount of data that must
be processed later in the query evaluation.
To determine if
partial early aggregation is being used for a particular query activate
the Explain facility and run the query. A new value in the EXPLAIN_ARGUMENT
table indicates when this new functionality has been applied to a
query:
- ARGUMENT_TYPE column = AGGMODE
- ARGUMENT_VALUE column can now also have the value: HASHED
PARTIAL which indicates that this new feature has been used
The
db2exfmt tool will also show
HASHED
PARTIAL in its output for
GRPBY sections,
along with a
pGRPBY in the tree view, if this new
functionality has been applied within that part of the query.
Hash join now selected by the query optimizer for
a wider range of SQL queries
The query optimizer chooses
between three basic join strategies when determining how to run an
SQL query that includes a join. In many cases a hash join is the most
efficient method, and with this release it can be used in more situations.
- Data type mismatches
- A hash join will now be considered even if the two columns in
the join are not the same data type. This is the case in all but the
most extreme situations.
- Expressions used in join predicate
- Join predicates that contain an expression no longer restrict
the join method to a nested loop join. In this release a hash join
is considered in cases where the WHERE clause contains an expression,
like: WHERE T1.C1 = UPPER(T1.C3)
In these cases the hash join is considered automatically.
There is no need to change any existing SQL queries to take advantage
of this improved functionality. Note that hash joins make use of sort
heap memory.
Improved cost estimates of network communication traffic
generated by a query
The query optimizer
relies on a range of information to choose an access plan that is
efficient as possible. The estimated communication costs of queries
has now been improved, enabling the optimizer to more accurately consider
and compare all of the CPU, IO, and communication costs. In many cases
this will result in faster query performance.
The estimated
per-node communication costs of a query, as returned by the explain
elements
COMM_COST and
FIRST_COMM_COST,
have been improved. They are now more consistent with the existing
CPU and IO costs per-node calculations. This enables the query optimizer
to effectively balance all three of these cost estimations when evaluating
different access plans. It also helps increase parallelism when possible
by enabling the network traffic to be spread more evenly across multiple
network adapters. In particular:
- If there is more than one network adapter involved, the cumulative
communication cost for the adapter with the highest value is returned.
In previous releases the total number of frames transmitted throughout
the entire network was returned.
- The values only include the costs of network traffic between physical
machines. They do not include the virtual communication costs between
node partitions on the same physical machine in a partitioned database
environment.