IBM Support

IC71743: DB2 HANGS DUE TO INDIRECT DEAD LATCH BETWEEN THE AGENT EXECUTING A QUERY AND AN INDEPENDENT COORDINATOR AGENT SPAWNED

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • When running out of package cache memory space, DB2 will spawn
    an independent coordinator agent to clean up package cache  This
    might cause an indirect dead latch between the agent executing a
    query and the spawned independent coordinator agent. The
    scenario of the dead latch situation can be illustrated as
    below.
    
    1) the agent executing a query spawns an independent coordinator
    agent to clean up package cache and is waiting for the new
    agent's post back while itself is holding a latch.
    
    For example, you might see the agent 8244 waiting for the post
    had the stacks similar to below:
    
      semtimedop + 0x000a
      sqloWaitEDUWaitPost + 0x019d
      sqeIcoordCB18WaitForSanityCheckEP5sqlca + 0x003d
      sqeIcoordCB26DispatchIndependentDBAgent+ 0x0281
    
      ABPIcoordAgent16spawnIcoordAgent + 0x00cb
    
      ABPDispatcher10spawnAgent + 0x006f
    
      ABPDispatcher23activateTaskProContexts + 0x0165
    
      abpActivateTaskProContextsPK8sqeAgent + 0x0069
      sqlra_cache_del_var + 0x0a26
    
      sqlra_csm_drop_var+ 0x01a8
    
      sqlra_csm_clean_lru + 0x0e03
      sqlra_cache_space_mgmtP8sqlrr_cbm + 0x00ec
      sqlra_cache_reserve_memoryP8sqlrr_cbm + 0x02d8
      sqlra_rollup_dyn_stmtPK11sqlmon_stmt + 0x04d6
      sqlmon_conn8stmt_end + 0x0320
    
      sqlmon_acb14agent_stmt_endEjbP8sqlrr_cb + 0x0668
      sqlrr_rds_common_postP14db2UCinterfaceiil + 0x1350
      sqlrr_prepareP14db2UCinterfaceP16db2UCprepareInfo + 0x02af
    
    
    
    and db2pd -latches show one of the latches it's holding was an
    RDS latch for the monitor:
    
    Address            Holder    Waiter    Filename            LOC
         LatchType            HoldCount
    0x0000000200DBBB68 8244       25         Unknown
    681        SQLO_LT_sqlmon_conn__rds_latch 1
    
    
    2) On the other hand, the newly spawned independent coordinator
    agent is waiting for a database latch being hold by a third
    agent, so it never got a chance to post back to the agent which
    has spawned it.
    
    For example, you might see the spawned independent agent 8399
    had the stacks similar to below.
    
    sqloSpinLockConflict + 0x0240
    sqeLocalDatabase29ShouldWeStartBackgroundAgents + 0x042a
    
    sqlm_a_initP8sqeAgent + 0x023a
    sqeApplication20InitEngineComponents+ 0x06c9
    
    sqeApplication13AppStartUsing  + 0x04f2
    
    sqleSubAgentStartUsingP8sqeAgentP16SQLE_CLIENT_INFO + 0x035a
    sqeApplication22AppSecondaryStartUsing + 0x0202
    
    sqleIndCoordProcessRequestP8sqeAgent + 0x059f
    sqleIndCoordProcessRequestP8sqeAgent + 0x02d8
    sqleIndCoordProcessRequestP8sqeAgent + 0x0084
    sqeAgent6RunEDUEv + 0x0381
    
    
    and db2pd -latches show it's waiting for the database latch hold
    by the third agent 25:
    
    Address            Holder    Waiter    Filename            LOC
         LatchType            HoldCount
    0x00000002004D7F48 25         8339       Unknown
    395        SQLO_LT_sqeLocalDatabase__dblatch 1
    
    
    3) The third agent is in turn waiting for the latch being hold
    by the original agent executing the query as in 1).
    
    For example, you might see the third agent 25 had the stacks
    similar to below.
    
    sqloSpinLockConflict + 0x0240
    sqloxult_trackP11sqlo_xlatch14SQLO_LT_VALUES + 0x005f
    sqm_collect_db_bp_data + 0x03b0
    
    sqm_snap_dbase + 0x013e
    
    sqlmonssagnt + 0x0466
    
    sqlmonssbackendP12SQLE_DB2RA_T + 0x0500
    sqlesrvrP14db2UCinterface + 0x05bd
    sqleMappingFnServerP5sqldaP5sqlca + 0x049d
    sqlerKnownProcedure + 0x0270
    
    sqlerCallDLP14db2UCinterfaceP9UCstpInfo + 0x0492
    
    and db2pd -latches show it's waiting for the RDS latch hold by
    the original agent 8244.
    
    Address            Holder    Waiter    Filename            LOC
         LatchType            HoldCount
    
    0x0000000200DBBB68 8244       25         Unknown
    681        SQLO_LT_sqlmon_conn__rds_latch 1
    
    
    
    As a result, the dead latch occurs. Recycling the instance will
    solve the dead latch.
    
    This issue doesn't occur on DB2 version prior to DB2 UDB V9.7
    for LUW.
    

Local fix

  • 1) turn off the default statement switch (DFT_MON_STMT)  in the
    Database Manager Configuration. Also need to make sure there is
    no application turn on the database manager  statement switch.
    For example, db2 Governor tool will turn on the statement
    switch. If db2 Governor is used, it need to be disabled as well.
    
    2) increasing the package cache size(PCKCACHESZ) in the Database
    Configuration may also reduce the chance of getting the dead
    latch
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * The issue doesn't occur on DB2 versions prior to DB2 UDB     *
    * V9.7 for LUW                                                 *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See above Error Description                                  *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to DB2 Version 9.7 Fix Pack 4                        *
    ****************************************************************
    

Problem conclusion

  • Problem was first fixed in Version 9.7 Fix Pack 4
    

Temporary fix

  • See above Local Fix
    

Comments

APAR Information

  • APAR number

    IC71743

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    970

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2010-10-07

  • Closed date

    2011-05-02

  • Last modified date

    2011-05-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • R970 PSN

       UP



Document information

More support for: DB2 for Linux, UNIX and Windows

Software version: 9.7

Reference #: IC71743

Modified date: 02 May 2011