IBM Support

PI44960: CQS IS HUNG AFTER EMHQ STRUCTURE RECOVERY FAILS ON NON-MASTER WITH MESSAGE CQS0242E RC=43000080 15/07/23 PTF PECHANGE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • An EMHQ structure recovery that is initiated with a CFRM policy
    EMHQ definition that is too small results in the master
    incorrectly reporting the structure recovery worked, and
    the non-master reporting that the recovery failed with
    message CQS0242E RC=43000080.
    

Local fix

  • VERIFY THE CFRM POLICY DEFINITION FOR THE EMHQ STRUCTURE
    CORRECTLY REFLECTS THE CURRENT STRUCTURE SPECIFICATIONS.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All IMS and CQS V10 users of shared EMHQ.    *
    ****************************************************************
    * PROBLEM DESCRIPTION: After PK64986, An EMHQ structure        *
    *                      recovery that is initiated with a CFRM  *
    *                      policy EMHQ definition that is too      *
    *                      small results in the master incorrectly *
    *                      reporting the structure recovery        *
    *                      worked, and the non-master reporting    *
    *                      that the recovery failed with message   *
    *                      CQS0242E RC=43000080 size or ratio      *
    *                      mismatch.  After this, the non-master   *
    *                      CQS appears hung and processes no CQS   *
    *                      requests.                               *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    Two IMSs are cold started and their CQSs started after the
    coupling facility is reassigned back to the original CPU.  The
    EMHQ structure definition in the CFRM policy is smaller than it
    was when client data was previously checkpointed to the SRDS
    data set.
    
    The first CQS initializes and determines that the EMHQ structure
    needs to be rebuilt, becoming the structure recovery master.
    The structure recovery master detects a structure size/ratio
    mismatch with the data in the SRDS data set, and aborts the EMHQ
    structure recovery by sending an XCF message to the second CQS
    (non-master).  The master CQS erroneously continues performing
    the EMHQ structure recovery without aborting the rebuild for
    itself, and incorrectly reports that the rebuild completed
    successfully.
    
    The non-master CQS aborts the EMHQ structure recovery and issues
    message CQS0242E RC=43000080, followed by message CQS0244E.  Any
    subsequent structure process (such as structure checkpoint)
    fails because of the EMHQ structure recovery failure on the non-
    master CQS.  CQS appears to be hung and not able to do work,
    because the EMHQ structure recovery did not complete
    successfully.
    
    Depending upon the timing of CQS initialization for both CQSs,
    CQSXCF10 might fail with an ABEND0C4 for the non-master CQS.
    
    The CQS EMHQ structure recovery master should have aborted the
    EMHQ structure recovery for itself, after it detected the
    structure size mismatch error.
    
    Both of the CQSs should have attempted to initiate another EMHQ
    structure recovery, in order to correct the structure size
    mismatch.
    

Problem conclusion

  • GEN:
    KEYWORDS:
     SYSPLEXSQ
    
    *** END IMS KEYWORDS ***
    CQSSTR00 is changed in subroutine RBLDABRT, just before it tests
    for a WAITRBLD=NO structure.  Code is added to first check if
    the structure is empty, and if it is, branch to process the XES
    rebuild abort, because all rebuild phases for a rebuild to an
    empty structure are processed within the XES rebuild, even for
    WAITRBLD=NO.
    
    
    CQSXCF10 is changed after label MSG_0240 to skip processing the
    rebuild XCF message, if this CQS is not aware of a rebuild in
    progress.  This can happen if CQS is initializing and didn't get
    previous rebuild messages.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PI44960

  • Reported component name

    IMS V10

  • Reported component ID

    5635A0100

  • Reported release

    010

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-07-14

  • Closed date

    2015-07-29

  • Last modified date

    2015-09-02

Modules/Macros

  • CQSSTR00 CQSXCF10
    

Fix information

  • Fixed component name

    IMS V10

  • Fixed component ID

    5635A0100

Applicable component levels

  • R010 PSY UI29839

       UP15/08/08 P F508 «

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
02 September 2015