IBM Support

PI41605: WMQ 7.1 Z/OS:QUEUE MANAGER IS UNRESPONSIVE DUE TO A DEADLOCK.

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Customer reported a queue manager became unresponsive.
    They could not connect via the panels nor
    connect via Admin tool over svrconn channel.
    The THR=* formatter shows that several threads, including the
    command server thread RTSSRV01 are waiting for a latch.
    There is a deadlock / deadly embrace involving this lock.
    
    
    *PEB 31266B70 ACE 31266B10 CCB 31233808 SRB 312F5C70 PROG 8000
    SYSTEM   201.RAHEAD02
    ROB is Q'd on a Latch waiter chain
    Susp by MQ EBSUS14 3146C5F1 ROBSOT 314613CA
    Latch Waited on 7EF58FE8 DSA 7CF52420 ()
    Suspend issued at 1900/05/18 14:47:26.863393
    Latch 7EF58FE8 is HELD by EB 3127A690        <===
    7CF527D0 7CF52420 7CF52198 7CF51F80 7CF51D40 7CF4EEC8 7CF4E248
     -------- -------- CSQP3GET CSQP1GET CSQP1RAH CSQIRAHP --------
    LOWN 7ED72040  ITHR 7D0291B0  SOFTLOG 00000000
    MTHR 7D03E7A0 Open Handles = 0 LastGETexp 0.0
    
    *PEB 3127A690 ACE 3127A630 CCB 3126D838 SRB 312F6C60 PROG 8000
    SYSTEM   215.DWP_O305
    ROB is Q'd on a Latch waiter chain
    Latch Held Mask 00010000 = (16)BMXL2/RMCRMST/RLMARQC
    Susp by MQ EBSUS14 3146C5F1 ROBSOT 314613CA
    Latch Waited on 7D6AFFC0 DSA  7EEF9288 ()
    Suspend issued at 1970/10/06 20:14:13.556985
    Latch 7D6AFFC0 is HELD by EB 31266B70        <===
    This is a BDSC Dlatch for Psid 00000006 Page 00003BE2
    7EEF9638 7EEF9288 7EEF9108 7EEF7270 7EEF6EC8 7EEF6248
      -------- -------- CSQP4DWP CSQP2DWP CSQP1DWP --------
    
    Tasks suspended waiting for the latches will be paused in
    module CSQVSRX. In rare cases, EOM processing can require one of
    the held latches, leading to it hanging until the system
    abnormally terminates the hung EOM task with abend S30D, leading
    to abnormal queue manager termination.
    
    Additional Symptom(s) Search Keyword(s):
    S30D ABEND30D ABENDS30D S030D
    

Local fix

  • Restart the queue manager. It may be necessary to cancel it.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 *
    *                 Release 1 Modification 0.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: Applications hang during MQI calls due  *
    *                      to BDSC latch contention.               *
    *                      Internal MQ processing and queue        *
    *                      manager shutdown also hangs.            *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    An application releasing an updated page was unable to latch the
    buffer used for the page, and consequently left the buffer on
    the buffer pool LRU chain. During deferred write processing,
    CSQP4DWP marked the buffer as clean after the update to the
    pageset had completed (making it eligible for page stealing),
    and then later attempted to check if the buffer needed to be
    added back on to the LRU chain while holding a latch on the lru
    chain - to do this it requested a latch on the buffer.
    However, between the page being marked clean and CSQP4DWP
    obtaining the LRU latch, the readahead task CSQP1RAH stole
    the buffer and obtained the buffer latch. CSQP1RAH then
    attempted to get another buffer, and suspended waiting for the
    LRU latch.
    This resulted in the reported deadlock between the deferred
    write processor (which held the LRU latch and required the BDSC
    latch) and the readahead task (which held the BDSC latch and
    required the LRU latch).
    
    Any application or queue manager tasks requiring pages using
    the same buffer pool will also hang waiting for the LRU latch
    unless the requested page is already available in a buffer.
    

Problem conclusion

  • CSQP4DWP is updated to prevent the pages it is processing from
    being eligible for stealing until after they have been added
    back to the LRU chain, preventing this deadlock situation from
    occurring.
    100Y
    CSQP3GET
    CSQP4DWP
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PI41605

  • Reported component name

    WMQ Z/OS V7

  • Reported component ID

    5655R3600

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-05-21

  • Closed date

    2015-08-26

  • Last modified date

    2015-12-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    PI47171 UI30530

Modules/Macros

  • CSQP3GET CSQP4DWP
    

Fix information

  • Fixed component name

    WMQ Z/OS V7

  • Fixed component ID

    5655R3600

Applicable component levels

  • R100 PSY UI30530

       UP15/10/08 P F510 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
09 December 2015