IBM Support

IV82627: CAA: A NODE MAY NOT SEE A REBOOTED NODE AS UP APPLIES TO AIX 6100-09

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • **************************************************************
    * USERS AFFECTED:
      * Systems running the AIX 6100-09 Technology Level
      * with bos.cluster.rte below the 6.1.9.101 level.
      **************************************************************
      * PROBLEM DESCRIPTION:
      *     After reboot of one node, the CAA cluster state
      *     may be inconsistent in a cluster using multicast
      *     communication mode, if there is an issue with
      *     multicast communication, but unicast communication
      *     is working.
      *     'lscluster -m' of node1:
      *     ------------------------
      *     Calling node query for all nodes...
      *     Node query number of nodes examined: 2
      *
      *             Node name: node1
      *             Cluster shorthand id for node: 1
      *             ...
      *             State of node: UP  NODE_LOCAL
      *             ...
      *             Node name: node2
      *             Cluster shorthand id for node: 2
      *             ...
      *             State of node: DOWN
      *             ...
      *     'lscluster -m' of node2:
      *     ------------------------
      *     Calling node query for all nodes...
      *     Node query number of nodes examined: 2
      *
      *             Node name: node1
      *             Cluster shorthand id for node: 1
      *             ...
      *             State of node: UP
      *             ...
      *             Node name: node2
      *             Cluster shorthand id for node: 2
      *             ...
      *             State of node: UP  NODE_LOCAL
      *             ...
      *     In the above example node2 was the last node, which
      *     has been rebooted.
      *     syslog.caa of node1 looks like:
      *     -------------------------------
      *     ...
      *     <timestamp> node1 caa:info unix: kcluster_lock.c
      *      count_active_nodes      200      num_nodes_active 2
      *      *up_node_cnt 1 db_node_cnt 1
      *     <timestamp> node1  caa:err|error unix:
      *      kcluster_clusterwide.c
      *      kcluster_clusterwide    841     clusterwide query
      *      node timeout: cmd = 0x20, from node id = 2
      *     ...
      *     <timestamp> node1 caa:err|error unix:
      *      kcluster_clusterwide.c
      *      kcluster_clusterwide    841     clusterwide query
      *      node timeout: cmd = 0x20, from node id = 2
      *     ...
      *     syslog.caa of node2 looks like:
      *     -------------------------------
      *     ...
      *     <timestamp> node2  caa:info unix: kcluster_syscalls.c
      *      _xcluster_create        2614
      *      Clusterwide locking services are starting.
      *     ...
      *     <timestamp> node2 caa:info unix: kcluster_lock.c
      *      count_active_nodes      200      num_nodes_active 2
      *      *up_node_cnt 0 db_node_cnt 1
      *     <timestamp> node2 caa:info unix: kcluster_lock.c
      *      wait_on_node_bringup    255     All nodes are active.
      *     ...
      *     <timestamp> node2  caa:info unix: kcluster_lock.c
      *      count_active_nodes      200      num_nodes_active 2
      *      *up_node_cnt 0 db_node_cnt 1
      *     <timestamp> node2  caa:info unix: kcluster_lock.c
      *      xcluster_lock   607     xcluster_lock: lock
      *      2 acquired, num_nodes_active: 2
      *     <timestamp> node2  caa:info unix: kcluster_lock.c
      *      xcluster_lock   608     xcluster_lock: nodes
      *      which responded: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0
      *     ...
      *     <timestamp> node2 caa:info clusterÝ2490836¨:
      caa_config.c
      *      cl_th_sock      5317    258     Node node1
      *      is DOWN, and we are not trying to JOIN it or STOP it.
      *      Skipping.
      *     ...
      **************************************************************
      * RECOMMENDATION:
      * Install APAR IV82627.
      **************************************************************
    

Local fix

  • Use unicast communication mode.
    

Problem summary

  •   **************************************************************
      * USERS AFFECTED:
      * Systems running the AIX 6100-09 Technology Level
      * with bos.cluster.rte below the 6.1.9.101 level.
      **************************************************************
      * PROBLEM DESCRIPTION:
      *     After reboot of one node, the CAA cluster state
      *     may be inconsistent in a cluster using multicast
      *     communication mode, if there is an issue with
      *     multicast communication, but unicast communication
      *     is working.
      *     'lscluster -m' of node1:
      *     ------------------------
      *     Calling node query for all nodes...
      *     Node query number of nodes examined: 2
      *
      *             Node name: node1
      *             Cluster shorthand id for node: 1
      *             ...
      *             State of node: UP  NODE_LOCAL
      *             ...
      *             Node name: node2
      *             Cluster shorthand id for node: 2
    

Problem conclusion

  • If it is known that a certain number of nodes is heartbeating
    to the repository, do not attempt to acquire clusterwide locks
    until the number of nodes gossiping is equal to it.
    

Temporary fix

  •   *********
      * HIPER *
      *********
    

Comments

APAR Information

  • APAR number

    IV82627

  • Reported component name

    AIX 610 STD EDI

  • Reported component ID

    5765G6200

  • Reported release

    610

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Submitted date

    2016-03-11

  • Closed date

    2016-03-11

  • Last modified date

    2016-11-09

  • APAR is sysrouted FROM one or more of the following:

    IV82494

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX 610 STD EDI

  • Fixed component ID

    5765G6200

Applicable component levels

  • R610 PSY U868845

       UP16/10/25 I 1000 Ž

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSAUMY","label":"IBM AIX Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SG11R","label":"APARs - AIX 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"610","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
17 December 2021