IBM Support

IV50788: DOC HA 7.1 HOWTO HANDLE SIMULTANEOUSLY REP DISK AND NODE FAILURE

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as documentation error.

Error description

  • This APAR is applicable for AIX/PowerHA environments where CAA
    (Cluster Aware AIX) clustering is used.
    .
    CAA uses a shared disk across all the nodes in the cluster
    called repository disk. This disk is used for maintaining
    cluster related configuration information. Disk is also used
    for health management through heart beating etc.
    .
    Note that PowerHA and CAA cluster survives loss of repository
    disk. Administrator can recreate the repository disk in such a
    scenario by using the PowerHA SMIT (providing a new disk to be
    used as repository disk)
    .
    This APAR describes an error scenario where loss of repository
    disk occurs and then all the nodes in the cluster also reboot.
    This would result in no configuration information available for
    the nodes to activate the cluster. CAA requires cluster
    configuration on the repository disk to bring up the cluster
    services during boot.
    .
    In this case it is necessary to rebuild the repository disk
    before the cluster services can be activated.
    .
    This APAR provides the PowerHA fixes needed to recover from
    such an error scenario. Note that these PowerHA fixes require
    associated changes in CAA/AIX to operate properly. Please
    obtain the fixes for CAA/AIX using the APARs IV53637 & IV57754
    before proceeding with the PowerHA fix updates..
    .
    Once the CAA/AIX and PowerHA fixes are applied, one can use the
    following process to rebuild the repository disk and then
    activate the cluster services.
    .
    Rebuild Repository disk on one Node
    -----------------------------------
    To rebuild the repository disk execute the following on a node
    (say node A) that has rebooted and also does not have access to
    the original repository disk (Note that cluster will not be
    active due to the unavailability of the original repository
    disk). Note that a new disk will need to be used to rebuild the
    repository using the following process:
    .
    $ smitty sysmirror
    - Problem Determination Tools
    -- Replace the Primary Repository Disk
    .
    Once this step is done, cluster services can be brought online
    on node A.
    .
    Starting Cluster Services on the other nodes of the Cluster
    -----------------------------------------------------------
    Once cluster services have been brought up on node A, other
    nodes in the cluster can be made to join the cluster of node A
    (which is using a new repository disk) using the following
    steps:
    .
    Note that nodes other than node A will continue to access old
    (original) repository disk (if they have access to the original
    repository disk and if the disk is available by now) and would
    need to be pointed to the new repository disk.
    .
    To force these nodes to use the new repository disk, execute
    the following steps on these nodes:
    .
    1.      On each of the nodes in the cluster (nodes other than
    node A) Stop CAA from using the original repository disk
    $ export CAA_FORCE_ENABLED=true
    $ clusterconf -fu
    .
    Check with lscluster -c' that CAA cluster services are inactive.
    .
    2.      From node A, request other nodes to join the CAA
    cluster by executing:
    .
    $ clusterconf ヨp
    .
    .
    Execute 'lscluster -c' and 'lscluster -m' commands
    to verify that CAA is successfully restarted.
    .
    .
    3.      From Node A Synchronize the new repository disk PowerHA
    configuration information from node A to other nodes.:
    .
    $ smitty sysmirror
    - Cluster Nodes and Networks
    -- Verify and Synchronize Cluster Configuration
    .
    .
    4.      On All nodes other than Node A, start the PowerHA
    cluster Services:
    .
    $ smitty sysmirror
    - System Management (C-SPOC)
    -- PowerHA SystemMirror Services
    --- Start Cluster Services
    

Local fix

Problem summary

  • Documentation update for how to handle simultaneously rep disk
    and node failure
    

Problem conclusion

  • Documented the steps to handle rep disk and node failure.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV50788

  • Reported component name

    POWERHA SYSMIR

  • Reported component ID

    5765H3900

  • Reported release

    712

  • Status

    CLOSED DOC

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2013-10-15

  • Closed date

    2013-10-25

  • Last modified date

    2014-07-23

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLM9V","label":"PowerHA SystemMirror Standard Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"712","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSXU4N","label":"PowerHA SystemMirror Enterprise Edition for AIX"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"712","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLM9V","label":"PowerHA SystemMirror Standard Edition for AIX"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"712","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}},{"Business Unit":{"code":"BU008","label":"Security"},"Product":{"code":"SGL4G4","label":"PowerHA"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"712","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"712","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"712","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
20 October 2021