IBM Support

Identifying Bufferpool Latch Contention

Troubleshooting


Problem

A bad SQL execution plan usually manifests itself as poor SQL performance. In some cases however, the SQL statement may execute with reasonable performance when executed in isolation, but may be slow due to bufferpool latch contention when executed concurrently by many sessions.

Symptom

Latch contention observed in db2pd -latches or found via EDU stack traces.

Cause

Latches are internal DB2 locks used to serialize access to shared data structures.
There are many types of latches used throughout the DB2 code in order to prevent concurrent users of a certain data structure from changing values at the same time.

When there is heavy, sustained access on a shared resource, the latches are heavily used as well. This can be defined as latch contention, as many users are contending for the same resource. Latch contention can lead to some increase in CPU usage.

DB2 has the ability to dump the currently in-use latches using the db2pd -latches command.
This is an instant snapshot of the in use latches and to get a clear picture , it should be executed a few times to see how latch usage evolves.

e.g. :
db2pd -latches -rep 1 10

to execute this 10 times with a 1 second interval.

When the output shows for instance :

Latches:
Address Holder Waiter Filename LOC LatchType
0x0780000039CE8980 385208 1900704 /view/db2_v91fp8_aix64_s090823/vbs/engn/include/sqlbslat.h 711 SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
0x0780000039CE8980 385208 1966278 /view/db2_v91fp8_aix64_s090823/vbs/engn/include/sqlbslat.h 711 SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
0x0780000039CE8D80 1548396 1773670 sqlbufix.C 367 SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
0x0780000039CE8D80 1548396 1765476 sqlbufix.C 367 SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
0x0780000039CE8D80 1548396 1523746 sqlbufix.C 367 SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
0x0780000039CE8D80 1548396 1015838 sqlbufix.C 367 SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
0x0780000039CE8D80 1548396 1618134 sqlbufix.C 367 SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
....
( the list should be much longer in order to genuinely

we can see that there are a lot of waiters on the latch with address 0x0780000039CE8D80
( the "groupLatch" ).

We can also see that even though the address of the latch does not change, the "Holder" value changes even during the course of the db2pd -latches execution.
This is evidence that the sessions are all moving forward, but there is just very heavy usage on this latch.

Specifically for the "groupLatch" latch, observed above. This symptom is not a defect with the product, but is likely related to a badly executing SQL statement.
A typical scenario would be a Nested Loop Join, with a large number of rows on the outer leg of it's operator.
The inner leg of a nested loop join operator will be executed for every row on the outer leg.
If on the inner leg of a nested loop join there is an index lookup ( IXSCAN operator ), which only accesses a few index pages,
then that will lead to a lot of accesses to the same few bufferpool pages during the course of the SQL execution.
If there are many sessions executing the same, it can lead to an increased SQL execution time or even a heightened cpu usage due to the latch contention created by this.

Diagnosing The Problem

db2pd -latches as explained above , should suffice in determining there is a latch contention issue.

DB2 Support may request stack traces using db2pd -stack all or db2fodc -perf execution to be taken.

Resolving The Problem

The first step in resolving the issue is verifying the SQL statement that is executing.

A dynamic SQL snapshot will help you identify which statements have clocked up a lot of cpu usage.

Using db2pd -applications and identifying the statements that are being executed via db2pd -dynamic

should help drilling down to the statement(s) causing latch contention.

To resolve the problem, the query execution plan for the statement should then be investigated.

A solution may differ from case to case, and may involve a change to give the query compiler different statistical information to avoid the bad query execution plan, or even a change in ddl for the tables involved.

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Database Objects\/Config - Bufferpools","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.8;9.7;9.5;9.1;10.1;10.5;11.1","Edition":"Enterprise Server;Express;Personal;Personal Developer's;Workgroup Server","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21456306