IBM Support

PI45146: CQS IS HUNG AFTER EMHQ STRUCTURE RECOVERY FAILS ON NON-MASTER WITH MESSAGE CQS0242E RC=4300080

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • An EMHQ structure recovery that is initiated with a CFRM policy
    EMHQ definition that is too small results in the master
    incorrectly reporting the structure recovery worked, and
    the non-master reporting that the recovery failed with
    message CQS0242E RC=4300080.
    

Local fix

  • VERIFY THE CFRM POLICY DEFINITION FOR THE EMHQ STRUCTURE
    CORRECTLY REFLECTS THE CURRENT STRUCTURE SPECIFICATIONS.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All IMS and CQS V12 users of shared EMHQ.    *
    ****************************************************************
    * PROBLEM DESCRIPTION: An EMHQ structure recovery that is      *
    *                      initiated with a CFRM policy EMHQ       *
    *                      definition that is too small results in *
    *                      the master incorrectly reporting the    *
    *                      structure recovery worked, and the non- *
    *                      master reporting that the recovery      *
    *                      failed with message CQS0242E            *
    *                      RC=43000080 size or ratio mismatch.     *
    *                      After this, the non-master CQS appears  *
    *                      hung and processes no CQS requests.     *
    *                      requests.                               *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    Two IMSs are cold started and their CQSs started after the
    coupling facility is reassigned back to the original CPU.  The
    EMHQ structure definition in the CFRM policy is smaller than it
    was when client data was previously checkpointed to the SRDS
    data set.
    
    The first CQS initializes and determines that the EMHQ structure
    needs to be rebuilt, becoming the structure recovery master.
    The structure recovery master detects a structure size/ratio
    mismatch with the data in the SRDS data set, and aborts the EMHQ
    structure recovery by sending an XCF message to the second CQS
    (non-master).  The master CQS erroneously continues performing
    the EMHQ structure recovery without aborting the rebuild for
    itself, and incorrectly reports that the rebuild completed
    successfully.
    
    The non-master CQS aborts the EMHQ structure recovery and issues
    message CQS0242E RC=43000080, followed by message CQS0244E.  Any
    subsequent structure process (such as structure checkpoint)
    fails because of the EMHQ structure recovery failure on the non-
    master CQS.  CQS appears to be hung and not able to do work,
    because the EMHQ structure recovery did not complete
    successfully.
    
    Depending upon the timing of CQS initialization for both CQSs,
    CQSXCF10 might fail with an ABEND0C4 for the non-master CQS.
    
    The CQS EMHQ structure recovery master should have aborted the
    EMHQ structure recovery for itself, after it detected the
    structure size mismatch error.
    
    Both of the CQSs should have attempted to initiate another EMHQ
    structure recovery, in order to correct the structure size
    mismatch.
    

Problem conclusion

  • GEN:
    KEYWORDS:
     SYSPLEXSQ
    
    *** END IMS KEYWORDS ***
    CQSSTR00 is changed in subroutine RBLDABRT, just before it tests
    for a WAITRBLD=NO structure.  Code is added to first check if
    the structure is empty, and if it is, branch to process the XES
    rebuild abort, because all rebuild phases for a rebuild to an
    empty structure are processed within the XES rebuild, even for
    WAITRBLD=NO.
    
    
    CQSXCF10 is changed after label MSG_0240 to skip processing the
    rebuild XCF message, if this CQS is not aware of a rebuild in
    progress.  This can happen if CQS is initializing and didn't get
    previous rebuild messages.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PI45146

  • Reported component name

    IMS V12

  • Reported component ID

    5635A0300

  • Reported release

    200

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-07-16

  • Closed date

    2015-08-26

  • Last modified date

    2015-10-02

  • APAR is sysrouted FROM one or more of the following:

    PI44960

  • APAR is sysrouted TO one or more of the following:

    UI30571

Modules/Macros

  •    CQSSTR00 CQSXCF10
    

Fix information

  • Fixed component name

    IMS V12

  • Fixed component ID

    5635A0300

Applicable component levels

  • R200 PSY UI30571

       UP15/09/02 P F509 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Platform":[{"code":"PF054","label":"z Systems"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
14 December 2020