IBM Support

Proper configuration of PowerHA cluster rhosts files

Troubleshooting


Problem

Configuring the rhosts file in an HACMP cluster node with proper data to allow node to node communication during cluster creation, sync / verfiy and other cluster actions

Symptom

Can not make initial cluster or sync verification fails with communication errors

Cause

rhosts file not properly configured

Environment

HACMP v6.1 or less

Diagnosing The Problem

When trying to configure a new cluster, or running a verification at any time, and smit gives errors or warnings concerning communication problems, one place to look for a possible issue will be the rhosts file on all cluster nodes.

Some typical smit errors seen during verification:
1) “WARNING: Unable to communicate with the remote node:”

2) If we had two cluster nodes named testbox1 and testbox2 with a communication problem you might get something like one of the following:

“ERROR: Unable to use communication interface 'testbox1'.
Communication Path testbox1 could not be resolved.
Verify that the communication path can be found in /etc/hosts.

ERROR: Unable to use communication interface 'testbox2'.
Communication Path testbox2 could not be resolved.
Verify that the communication path can be found in /etc/hosts.

For each of the above invalid communication interfaces listed,
please check that the owning node is running the clcomdES subsystem,
and that the /usr/es/sbin/cluster/etc/rhosts file is properly
configured.
Also, check /var/hacmp/clcomd/clcomd.log logfile on remote node for
possible errors.”

OR

“Communication Path testbox2 could not be resolved. “

There may be other indications of communication trouble.

First, ensure the services needed to communicate between the nodes are viable:
# vi /etc/services
-confirm in the 'services' file the following entries are uncommented (no '#' sign in front of them)
clcomd 6191/tcp
clsmuxpd 6270/tcp
topsvcs 6178/udp
grpsvcs 6179/udp

If any of them are commented out, remove the '#' sign to uncomment, them and then refresh the daemons:
refresh -s inetd
stopsrc -s clcomdES
startsrc -s clcomdES

Now, address the rhost access:
All the cluster nodes should have the same entries in /usr/es/sbin/cluster/etc/rhosts to allow each node to accept communications from the other nodes.

The /usr/es/sbin/cluster/etc/rhosts should NOT contain any comments of any kind. Simply place all the boot IPs or hostname from all network adapters participating in the cluster configuration into the file, one IP on each line.

For example:
We’ll say our node named testbox1 has two adapters and their boot IPs are 10.1.1.1 and 10.1.2.1
Likewise our node named testbox2 has two adapters with IPs of 10.1.1.8 and 10.1.2.8.

The /usr/es/sbin/cluster/etc/rhosts file on both hosts should look like this:
10.1.1.1
10.1.2.1
10.1.1.8
10.1.2.8

(NOTE: The IPs can appear in the file in any order)

After placing the IPs in the rhosts file and saving the file, refresh the clcomdES service by the following:
#stopsrc –s clcomdES
Note: it may take a minute or two to stop the service
(you can check the status with #lssrc –a | grep clcomd )
#startsrc –s clcomdES
This will restart the service again.

This can be done on a running cluster, it will not cause a fallover, but no other configuration changes should be made during the service refresh.

The communication can then be tested by running the following commands on both nodes testbox1 and testbox2

On testbox1 do:
testbox1 /# /usr/es/sbin/cluster/utilities/cl_rsh testbox1 date <enter>
example output -> Fri Dec 30 13:39:40 CST 2011
testbox2 /# /usr/es/sbin/cluster/utilities/cl_rsh testbox2 date <enter>
example output -> Fri Dec 30 13:39:47 CST 2011

On testbox2 do:
testbox1 /# /usr/es/sbin/cluster/utilities/cl_rsh testbox1 date <enter>
example output -> Fri Dec 30 13:41:50 CST 2011
testbox2 /# /usr/es/sbin/cluster/utilities/cl_rsh testbox2 date <enter>
example output -> Fri Dec 30 13:41:52 CST 2011

The above rsh (remote shell) commands tests the ability of each node to ask for and receive back the output of the date command. If the cl_rsh gives an error or simply nothing is returned, then there is still a communication problem. If the rhosts files are correct then you will need to troubleshoot somewhere else on the system or the tcp network.

For additional information on the rhosts file, see page 6 of the manual: High Availability Cluster Multi-Processing for AIX Administration guide ID: SC23-4862-12

Resolving The Problem

Make sure the rhosts file is correctly configured

[{"Business Unit":{"code":"BU008","label":"Security"},"Product":{"code":"SGL4G4","label":"PowerHA"},"ARM Category":[{"code":"a8m50000000L0XqAAK","label":"PowerHA->Installation {PHINST}"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
24 July 2020

UID

isg3T1012978