A fix is available
APAR status
Closed as program error.
Error description
If IRLM gets a Newconn event from XES just about the same time while a Rebuild comes in. Due to heavy locking going on, Rebuild is left on the Work-Todo-Q and Newconn request is left on the ORDER-Q. Since there is more work on the Q and the request handler SRB continues to process the Work-Todo-Q leaving ordered-Q work until it has completed the work swapped out of work-todo-Q. Now while IRLM is processing the rebuild if we get Failconn for the 'pending' Newconn member's connection, IRLM could leave the Failconn pending when rebuild has progressed to a stage where the rebuild is suspended waiting for next rebuild event. It causes this problem. XES also issues following message IXL041E CONNECTOR NAME:DXRPJ0A$$PJ9A009, JOBNAME: HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All HIR2220(IRLM 2.2) and HIR2230(IRLM 2.3) * * users of data sharing SYSPLEXDS who have * * PM65217 (PTFs UK79710/UK79709) applied. * **************************************************************** * PROBLEM DESCRIPTION: Sysplex wide hang may occur when * * multiple DB2/IMS members are restarted * * at the same time, or a group restart, * * and if IRLM lock structure rebuild is * * triggered in between for any reason. * * Lock structure rebuild gets hung. * * MSGIXL041E issued for delay in response * * for XES Failconn event. * * ABEBDS026 dump is also taken by XES for * * IRLM connector which did not respond. * * MVS may terminate non-responding IRLM * * causing DB2 to terminate as well. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** During restart of multiple DB2/IMS members, an IRLM may trigger the lock rebuild for co-existence when his IRLM function level is lower than the function level of the existing IRLMs in the group. The lock structure rebuild may also get started for other reasons as well (for maxuser, restart query, structure failure). There is a timing error in IRLM handling XES events for member disconnect that may happen while the lock structure is being quiesced for rebuild. IRLMs receiving the failed connection event may not process these events correctly and not send a response to XES for the failed connection event. This can leave XES waiting for IRLM response and surviving IRLMs participating in the rebuild waiting for next rebuild event from XES a deadlock situation causing the whole datasharing group hang. MSGIXL041E for IRLM CONNECTOR NAME:xxxxxxxx which has not responded to the DISCONNECTED/FAILED connection vent, is issued. MVS also issues ABEND=S026,REASON=08118001,CONNECTOR HANG for the hung connector.
Problem conclusion
During Rebuild if we get Failconn event for a member A while IRLM serialization (RLMFENCE) was held at that time to process global initialization for a member B, IRLM will queue the Failconn event processing (qe0507) for member A on its work-todo-queue instead of putting it on rebuild pending queue (RLMRBPQE). This will enable the Failconn event to be processed before the rebuild. IRLM will issue the Failconn event response that XES would be waiting on in order to first complete the connection cleanup for the lost member.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PM94539
Reported component name
IRLM V2
Reported component ID
569516401
Reported release
230
Status
CLOSED PER
PE
YesPE
HIPER
YesHIPER
Special Attention
NoSpecatt
Submitted date
2013-08-05
Closed date
2013-09-26
Last modified date
2013-11-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK97980 UK97981
Modules/Macros
DXRRL2R4 DXRRL710 DXRRL752 DXRRS2R4 DXRRS710 DXRRS752
Fix information
Fixed component name
IRLM V2
Fixed component ID
569516401
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEPHL","label":"IRLM"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"230","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
04 November 2013