IBM Support

PI45149: CQS IS HUNG AFTER EMHQ STRUCTURE RECOVERY FAILS ON NON-MASTER WITH MESSAGE CQS0242E RC=43000080

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • An EMHQ structure recovery that is initiated with a CFRM policy
    EMHQ definition that is too small results in the master
    incorrectly reporting the structure recovery worked, and
    the non-master reporting that the recovery failed with
    message CQS0242E RC=43000080.
    

Local fix

  • VERIFY THE CFRM POLICY DEFINITION FOR THE EMHQ STRUCTURE
    CORRECTLY REFLECTS THE CURRENT STRUCTURE SPECIFICATIONS.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All IMS and CQS V14 users of shared EMHQ.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * An EMHQ structure recovery that is                           *
    * initiated with a CFRM policy EMHQ                            *
    * definition that is too small results in                      *
    * the master incorrectly reporting the                         *
    * structure recovery worked, and the                           *
    * non-master reporting that the recovery                       *
    * failed with message CQS0242E                                 *
    * RC=43000080 size or ratio mismatch.                          *
    * After this, the non-master CQS appears                       *
    * hung and processes no CQS requests.                          *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * INSTALL CORRECTIVE SERVICE FOR APAR/PTF                      *
    ****************************************************************
    Two IMSs are cold started and their CQSs started after the
    
    coupling facility is reassigned back to the original CPU.  The
    
    EMHQ structure definition in the CFRM policy is smaller than it
    
    was when client data was previously checkpointed to the SRDS
    
    data set.
    
    The first CQS initializes and determines that the EMHQ structure
    
    needs to be rebuilt, becoming the structure recovery master.
    
    The structure recovery master detects a structure size/ratio
    
    mismatch with the data in the SRDS data set, and aborts the EMHQ
    
    structure recovery by sending an XCF message to the second CQS
    
    (non-master).  The master CQS erroneously continues performing
    
    the EMHQ structure recovery without aborting the rebuild for
    
    itself, and incorrectly reports that the rebuild completed
    
    successfully.
    
    
    
    The non-master CQS aborts the EMHQ structure recovery and issues
    
    message CQS0242E RC=43000080, followed by message CQS0244E.  Any
    
    subsequent structure process (such as structure checkpoint)
    
    fails because of the EMHQ structure recovery failure on the non-
    
    master CQS.  CQS appears to be hung and not able to do work,
    
    because the EMHQ structure recovery did not complete
    
    successfully.
    
    
    
    Depending upon the timing of CQS initialization for both CQSs,
    
    CQSXCF10 might fail with an ABEND0C4 for the non-master CQS.
    
    
    
    The CQS EMHQ structure recovery master should have aborted the
    
    EMHQ structure recovery for itself, after it detected the
    
    structure size mismatch error.
    
    
    
    Both of the CQSs should have attempted to initiate another EMHQ
    
    structure recovery, in order to correct the structure size
    
    mismatch.
    

Problem conclusion

  • CQSSTR00 is changed in subroutine RBLDABRT, just before it tests
    
     for a WAITRBLD=NO structure.  Code is added to first check if
    
     the structure is empty, and if it is, branch to process the XES
    
     rebuild abort, because all rebuild phases for a rebuild to an
    
     empty structure are processed within the XES rebuild, even for
    
     WAITRBLD=NO.
    
    
    
     CQSXCF10 is changed after label MSG_0240 to skip processing the
    
     rebuild XCF message, if this CQS is not aware of a rebuild in
    
     progress.  This can happen if CQS is initializing and didn't
    get
     previous rebuild messages.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI45149

  • Reported component name

    IMS V14

  • Reported component ID

    5635A0500

  • Reported release

    400

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-07-16

  • Closed date

    2015-08-20

  • Last modified date

    2015-10-19

  • APAR is sysrouted FROM one or more of the following:

    PI44960

  • APAR is sysrouted TO one or more of the following:

    UI30386

Modules/Macros

  • CQSXCF10 CQSSTR00
    

Fix information

  • Fixed component name

    IMS V14

  • Fixed component ID

    5635A0500

Applicable component levels

  • R400 PSY UI30386

       UP15/08/23 P F508

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPH2","label":"IMS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"14.1","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
30 November 2023