IBM Support

IC98789: HASH LATCH CONTENTION CAUSES POOR PERFORMANCE

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • This APAR applies to all platforms under the combination of the
    following conditions:
    1) A LOCKLIST greater than 8000 pages is used.
    2) Many applications are accessing one specific table
    concurrently via SELECT or DELETE queries, or via searched
    DELETE or CLOSE CURSOR operations against cursor opened FOR READ
    ONLY.
    
    If you suspect that you are hitting this problem, collect "db2pd
    -latches" at various intervals.  The output that is relevant in
    this case is column 2 (Holder), column 3 (Waiter) and column 5
    (LatchType).  If you see many lines of output that have a
    LatchType of SQLO_LT_SQLP_LHSH__hshlatch (lock manager hash
    table latch) and have the same Holder value (EDU ID of the EDU
    holding the latch) with various Waiter values (EDU ID of an EDU
    waiting on the latch), then it is possible that you might be
    hitting this issue.  Note that multiple unique holders may be
    present, as is the case in this example.
    Address            Holder     Waiter     Filename
    LOC        LatchType            HoldCount
    0x07000046EBBE0B58 798762     213934     sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    0x07000046EBBE0B58 798762     129857     sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    0x07000046EBBE0B58 798762     140132     sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    ... repeats many times ...
    ... with the same Holder value ...
    ... with varying Waiter values ...
    0x07000046EBBE0B58 798762     1186579    sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    0x07000046EBBE0B58 798762     1190691    sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    0x07000046EBBE0B58 798762     1200514    sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    0x07000046EBBE0B58 798762     1202054    sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    0x07000046EBBE0B58 798762     800313     sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    0x07000046EBBE0B58 798762     467072     sqlpLockInternal.h 554
    SQLO_LT_SQLP_LHSH__hshlatch 0
    ... skip some other entries ...
    0x07000046EBBE0B58 1156509    213934     sqlpLockInternal.h 520
    SQLO_LT_SQLP_LHSH__hshlatch 1
    0x07000046EBBE0B58 1156509    129857     sqlpLockInternal.h 520
    SQLO_LT_SQLP_LHSH__hshlatch 1
    0x07000046EBBE0B58 1156509    140132     sqlpLockInternal.h 520
    SQLO_LT_SQLP_LHSH__hshlatch 1
    ... repeats many times ...
    ... with the same Holder value (but different than the Holder
    value above) ...
    ... with varying Waiter values ...
    0x07000046EBBE0B58 1156509    148094     sqlpLockInternal.h 520
    SQLO_LT_SQLP_LHSH__hshlatch 1
    0x07000046EBBE0B58 1156509    149379     sqlpLockInternal.h 520
    SQLO_LT_SQLP_LHSH__hshlatch 1
    0x07000046EBBE0B58 1156509    160167     sqlpLockInternal.h 520
    SQLO_LT_SQLP_LHSH__hshlatch 1
    ... skip other entries ...
    Once you have confirmed from "db2pd -latches" output that your
    environment might be suffering from this issue, you can collect
    additional information from agents to confirm that this specific
    problem is the issue in your environment.
    For each of the holder values in the "db2pd -latches" output,
    collect "db2pd -stacks <holder_EDU_ID>" to dump the stack trace
    of the EDUs waiting on the hash latch.  This may need to be
    collected mutiple times in order to capture an instance when the
    EDU is actively holding the latch.
    The holder EDU stack that indicates the problem scenario looks
    like this:
    -------Frame------ ------Function + Offset------
    0x09000000000FF858 thread_wait + 0x98
    0x0900000045849C34 getConflictComplex__17SQLO_SLATCH_CAS64FCUl +
    0x3D4
    0x090000004584A258 getConflict__17SQLO_SLATCH_CAS64FCUl + 0xD8
    0x0900000045EEA990 sqlplrl__FP9sqeBsuEduP14SQLP_LOCK_INFO +
    0x3F0
    0x090000004685F3A0 sqldmclo__FP8sqeAgentPP8SQLD_CCBi + 0x1BA0
    0x09000000468545F0 sqlriclo__FP8sqlrr_cbP9sqlri_taoi + 0x550
    0x0900000045B9BA98 sqlricjp__FP8sqlrr_cbP12sqlri_opparmilT4 +
    0x2B8
    0x0900000045B9B4C8 sqlricls_simple__FP8sqlrr_cbil + 0x1488
    0x09000000476F15AC sqlrr_process_close_request__FP8sqlrr_cbiN32
    + 0x20C
    0x0900000046EFEEE8
    sqlrr_close__FP14db2UCinterfaceP15db2UCCursorInfo + 0x208
    
    In addition, for various waiter values in the "db2pd -latches"
    output, collect "db2pd -stacks <waiter_EDU_ID>".  Again, you may
    need to collect this multiple times in order to capture an
    instance when the EDU is actively waiting on the latch.
    The waiter EDU stack that indicates the problem scenario looks
    like this:
    -------Frame------ ------Function + Offset------
    0x09000000000F7A94 thread_wait + 0x94
    0x0900000037F7314C getConflictComplex__17SQLO_SLATCH_CAS64FCUl +
    0x318
    0x0900000037F7392C getConflict__17SQLO_SLATCH_CAS64FCUl + 0x68
    0x0900000037F255A4
    getConflict__17SQLO_SLATCH_CAS64FCUl@glueFB@clone1 + 0x74
    0x0900000037F254CC
    sqlpLatchHashEntryForTableLockExclusive__FP9SQLP_LHSH@glue13EF +
    0x20
    0x0900000037F927BC sqlplrq__FP9sqeBsuEduP14SQLP_LOCK_INFO +
    0x104
    0x0900000037FCBFC8
    sqldLockTable__FP8sqeAgentP14SQLP_LOCK_INFOUiUsi + 0xF8
    0x0900000037FCC650
    sqldScanOpen__FP8sqeAgentP14SQLD_SCANINFO1P14SQLD_SCANINFO2PPv +
    0x57C
    0x0900000037FCB8BC sqlriopn__FP8sqlrr_cbP9sqlri_taoPi + 0x29C
    0x0900000037FCB564 sqlriopn__FP8sqlrr_cbP9sqlri_taoPi@glue271 +
    0x74
    0x0900000037FA1D1C sqlrita__FP8sqlrr_cb + 0x6C
    0x0900000037F9FEA4 sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm +
    0x24
    0x0900000038015A2C
    sqlrr_process_fetch_request__FP14db2UCinterface + 0x1E4
    0x0900000037FF02E0
    sqlrr_open__FP14db2UCinterfaceP15db2UCCursorInfo + 0xB0C
    
    If the three primary conditions are met, and holder EDU and
    waiter EDU stacks match those listed above, then you might
    obtain relief after applying the local fix or by upgrading to a
    newer level of DB2 that contains the fix for this APAR.
    Local Fix
    Apply the following registry setting and restart DB2.
    DB2_KEEPTABLELOCK=TRANSACTION
    

Local fix

  • DB2_KEEPTABLELOCK=TRANSACTION
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All users                                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to DB2 V97FP10 or higher version.                    *
    ****************************************************************
    

Problem conclusion

  • Fixed on DB2 V97FP10 or higher version.
    

Temporary fix

  • DB2_KEEPTABLELOCK=TRANSACTION
    

Comments

APAR Information

  • APAR number

    IC98789

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    970

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-01-17

  • Closed date

    2014-11-18

  • Last modified date

    2014-11-18

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IT01914 IT01915

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • R970 PSN

       UP



Document information

More support for: DB2 for Linux, UNIX and Windows

Software version: 9.7

Reference #: IC98789

Modified date: 18 November 2014