IBM Support

IV97148: CAA:SLOW GOSSIP RECEIPT ON BOOT MAY CAUSE PARTITIONED CLUSTER

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • **************************************************************
    * USERS AFFECTED:
    * Systems running the AIX 6100-09 Technology Level
    * or VIOS 2.2.x.x
    * with bos.cluster.rte at the 6.1.9.200 or 6.1.9.201 level.
      **************************************************************
    * ERROR DESCRIPTION:
    * After rebooting a node in either a PowerHA or VIOS SSP
    * cluster
    * using CAA, there is a chance that the node may create its own
    * cluster, causing a split-brain / partitioned cluster in the
    * CAA environment.
    *
    * This is more likely to be seen if the network is slow and
    * there is a delay in gossip packets being received by the
    * rebooted node.
    *
    * The effect of a split-brain / partitioned cluster can vary,
    * but in the worst cases: PowerHA may react by bringing
    * resources online at the same time on multiple nodes, and
    * VIOS SSP can experience pool going down on one or more nodes.
      **************************************************************
    * RECOMMENDATION:
    * Install APAR IV97148.
    * Prior to fix availability, an interim fix is available from
    * either
    * ftp://aix.software.ibm.com/aix/ifixes/iv97148/
    * https://aix.software.ibm.com/aix/ifixes/iv97148/
    * Installation of the ifix requires a reboot.
      **************************************************************
    

Local fix

  • n/a
    

Problem summary

  • PROBLEM SUMMARY:
    After rebooting a node in either a PowerHA or VIOS SSP
    cluster using CAA, there is a chance that the node may
    create its own cluster, causing a split-brain / partitioned
    cluster in the CAA environment.
    This is more likely to be seen if the network is slow and
    there is a delay in gossip packets being received by the
    rebooted node.
    The effect of a split-brain / partitioned cluster can vary,
    but in the worst cases: PowerHA may react by bringing
    resources online at the same time on multiple nodes, and
    VIOS SSP can experience pool going down on one or more
    nodes.
    

Problem conclusion

  • There is a gate in which all initial clusterwide lock
    requests should consider the count of nodes heartbeating to
    the repository in addition to those gossiping over network.
    There was a hole in the gate and the fix closes it.
    

Temporary fix

  •   *********
      * HIPER *
      *********
    

Comments

APAR Information

  • APAR number

    IV97148

  • Reported component name

    AIX 610 STD EDI

  • Reported component ID

    5765G6200

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Submitted date

    2017-06-12

  • Closed date

    2017-06-13

  • Last modified date

    2017-11-07

Fix information

  • Fixed component name

    AIX 610 STD EDI

  • Fixed component ID

    5765G6200

Applicable component levels

  • R610 PSY U870060

       UP17/10/17 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
17 December 2021