IBM Support

IV35422: XDR PROXY NODES PERFORM AN UNLOCK/LOCK COMMAND WHENEVER THE MASTER PROXY SWITCHES, POTENTIALLY LEADING TO A CLUSTER SPLIT

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Support Engineer:  GBH
    Change Team Eng:   EJ
    
    Environment:
    TSAMP users using GDPS/PPRC Multiplatform Resiliency for System
    z (xDR guest Linux on System Z)
    TSAMP 3.2.2.2 and 3.2.2.3
    
    Details:
    A problem shows up when the MASTER role is moved to the backup
    proxy after a planned hyperswap.
    When running with a dual node proxy cluster, we wanted to
    minimize the amount of storage that will be page-fixed.
    Only the storage for the masterproxy is locked.
    During the xdr.switch logic, the "old master" gets it's storage
    unlocked by the erpdmaster script, while the new master gets
    it's storage locked.
    Locking the storage might stuck in CP, if too many pages are
    locked due to heavy load. This can lead to a cluster split, and
    both proxies state "I'm now the master proxy".
    The code resolves the cluster split after a short time, but
    customers might prefer to have both proxy nodes always lock
    their storage instead of seeing a cluster split.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: Tivoli System Automation for Multiplatforms
    * 3.2.2.2 and 3.2.2.3 users using GDPS/PPRC Multiplatform
    * Resiliency for System z for management of Linux on z/VM guests
    * (xDR z/VM guest Linux on System z)
    ****************************************************************
    * PROBLEM DESCRIPTION:
    * A problem shows up when the MASTER role is moved to the backup
    * proxy after a planned hyperswap. When running with a dual node
    * proxy cluster, the intention was to minimize the amount of
    * storage that will be page-fixed. Only the storage for the
    * masterproxy is locked.
    * Within the xdr.switch script logic, the "old master" gets it's
    * storage unlocked by the erpdmaster script, while the new
    * master gets it's storage locked. Locking the storage might
    * stuck in CP, if too many pages are locked due to heavy load.
    * This can lead to a cluster split, and both proxies state "I'm
    * now the master proxy". The code resolves the cluster split
    * after a short time, but customers might prefer to have both
    * proxy nodes always lock their storage instead of seeing a
    * cluster split.
    ****************************************************************
    * RECOMMENDATION:
    ****************************************************************
    

Problem conclusion

  • The internal handling within xDR has been modified to address
    this issue. Now the customer has the possibility to choose
    between two different memory locking setups:
    - Setup (1): Only the memory of the current master xDR proxy is
      completely locked. This is the default and matches to the
      original behavior.
    - Setup (2): The memory of both xDR proxy nodes can always be
      completely locked, but it has to be done manually, outside of
      xDR. The script erpdmaster does not lock any memory.
    Apply the APAR on both proxy cluster nodes and run the script
    enableErpd on one of the nodes. Then choose the memory locking
    setup that you prefer.
    .
    The official fix for this problem is included in fix pack 4 of
    Tivoli System Automation for Multiplatforms 3.2.2
    | 3.2.2-TIV-ITSAMP-FP0004 |
    .
    Additional Search Keywords
    .
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV35422

  • Reported component name

    SA MULTIPLATFOR

  • Reported component ID

    5724M0000

  • Reported release

    322

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2013-01-17

  • Closed date

    2013-02-11

  • Last modified date

    2013-02-11

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SA MULTIPLATFOR

  • Fixed component ID

    5724M0000

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSRM2X","label":"Tivoli System Automation for Multiplatforms"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"322","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
25 August 2023