IBM Support

PM94539: XES ISSUES ABEND026 REASON=08118001 WITH PM65217 (PTFS UK79710 / UK79709) APPLIED 13/08/12 PTF PECHANGE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • If IRLM gets a Newconn event from XES just about the same time
    while a Rebuild comes in. Due to heavy locking going on,
    Rebuild is left on the Work-Todo-Q and Newconn request is left
    on the ORDER-Q.
    Since there is more work on the Q and the request handler SRB
    continues to process the Work-Todo-Q leaving ordered-Q work
    until it has completed the work swapped out of work-todo-Q.
    Now while IRLM is processing the rebuild if we get Failconn for
    the 'pending' Newconn member's connection, IRLM could leave
    the Failconn pending when rebuild has progressed to a stage
    where the rebuild is suspended waiting for next rebuild event.
    It causes this problem.
    XES also issues following message
    IXL041E CONNECTOR NAME:DXRPJ0A$$PJ9A009, JOBNAME:   HAS NOT
    RESPONDED TO THE  DISCONNECTED/FAILED CONNECTION
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All HIR2220(IRLM 2.2) and HIR2230(IRLM 2.3)  *
    *                 users of data sharing SYSPLEXDS who have     *
    *                 PM65217 (PTFs UK79710/UK79709) applied.      *
    ****************************************************************
    * PROBLEM DESCRIPTION: Sysplex wide hang may occur when        *
    *                      multiple DB2/IMS members are restarted  *
    *                      at the same time, or a group restart,   *
    *                      and if IRLM lock structure rebuild is   *
    *                      triggered in between for any reason.    *
    *                      Lock structure rebuild gets hung.       *
    *                      MSGIXL041E issued for delay in response *
    *                      for XES Failconn event.                 *
    *                      ABEBDS026 dump is also taken by XES for *
    *                      IRLM connector which did not respond.   *
    *                      MVS may terminate non-responding IRLM   *
    *                      causing DB2 to terminate as well.       *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    During restart of multiple DB2/IMS members, an IRLM may trigger
    the lock rebuild for co-existence when his IRLM function level
    is lower than the function level of the existing IRLMs in the
    group. The lock structure rebuild may also get started for other
    reasons as well (for maxuser, restart query, structure failure).
    There is a timing error in IRLM handling XES events for member
    disconnect that may happen while the lock structure is being
    quiesced for rebuild. IRLMs receiving the failed connection
    event may not process these events correctly and not send a
    response to XES for the failed connection event. This can leave
    XES waiting for IRLM response and surviving IRLMs participating
    in the rebuild waiting for next rebuild event from XES
    a deadlock situation causing the whole datasharing group hang.
    MSGIXL041E for IRLM CONNECTOR NAME:xxxxxxxx  which has not
    responded to the DISCONNECTED/FAILED connection vent, is issued.
    MVS also issues ABEND=S026,REASON=08118001,CONNECTOR HANG for
    the hung connector.
    

Problem conclusion

  • During Rebuild if we get Failconn event for a member A
    while IRLM serialization (RLMFENCE) was held at that time
    to process global initialization for a member B, IRLM will
    queue the Failconn event processing (qe0507) for member A
    on its work-todo-queue instead of putting it on rebuild
    pending queue (RLMRBPQE). This will enable the Failconn
    event to be processed before the rebuild. IRLM will issue
    the Failconn event response that XES would be waiting on
    in order to first complete the connection cleanup for the
    lost member.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PM94539

  • Reported component name

    IRLM V2

  • Reported component ID

    569516401

  • Reported release

    230

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2013-08-05

  • Closed date

    2013-09-26

  • Last modified date

    2013-11-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK97980 UK97981

Modules/Macros

  •    DXRRL2R4 DXRRL710 DXRRL752 DXRRS2R4 DXRRS710
    DXRRS752
    

Fix information

  • Fixed component name

    IRLM V2

  • Fixed component ID

    569516401

Applicable component levels

  • R220 PSY UK97980

       UP13/10/13 P F310 ®

  • R230 PSY UK97981

       UP13/10/13 P F310 ®

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEPHL","label":"IRLM"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"230","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
04 November 2013