Changing partitioned nodes to failed

Sometimes, a partitioned condition is reported when there really was a node outage. This can occur when cluster resource services loses communications with one or more nodes, but cannot detect if the nodes are still operational. When this condition occurs, a simple mechanism exists for you to indicate that the node has failed.

Attention: When you tell cluster resource services that a node has failed, it makes recovery from the partition state simpler. However, changing the node status to failed when, in fact, the node is still active and a true partition has occurred should not be done. Doing so can cause a node in more than one partition to assume the primary role for a cluster resource group. When two nodes think they are the primary node, data such as files or databases can become disjoint or corrupted if multiple nodes are each independently making changes to their copies of files. In addition, the two partitions cannot be merged back together when a node in each partition has been assigned the primary role.

When the status of a node is changed to Failed, the role of nodes in the recovery domain for each cluster resource group in the partition may be reordered. The node being set to Failed will be assigned as the last backup. If multiple nodes have failed and their status needs to be changed, the order in which the nodes are changed will affect the final order of the recovery domain's backup nodes. If the failed node was the primary node for a CRG, the first active backup will be reassigned as the new primary node.

When cluster resource services has lost communications with a node but cannot detect if the node is still operational, a cluster node will have a status of Not communicating. You may need to change the status of the node from Not communicating to Failed. You will then be able to restart the node.

To change the status of a node from Not communicating to Failed, follow these steps:

  1. In a Web browser, enter http://mysystem:2001, where mysystem is the host name of the system.
  2. Log on to the system with your user profile and password.
  3. Select Cluster Resource Services from the IBM Systems Director Navigator for i window.
  4. On the Cluster Resource Services page, select the Work with Cluster Nodes task to show a list of nodes in the cluster.
  5. Click the Select Action menu and select Change Status. Change the status on the node to failed.