Resolve connectivity-related failures to troubleshoot failed server start attempts.
Connectivity failures due to network or configuration issues are the most common cause of a
CWRLS0030W transaction message. Connectivity failures can prevent a server from joining a view and
prevent the server from starting.
Before you begin
Check for the following CWRLS0030W message in the log file:
RLSHAGroupCal W CWRLS0030W: Waiting for HAManager to activate recovery processing for local WebSphere server
If you find the CWRLS0030W message, a DCSV8030I message is most likely present either
before or after the CWRLS0030W message:
RoleViewLeade I DCSV8030I: DCS Stack DefaultCoregroup at Member myCell\myNode1\myAppServer1: Failed to join or establish a view with member [myCell\myNode2\myAppServer2]. The reason is Not all candidates are connected ConnectedSetMissing= [myCell\myNode1\myAppServer5 myCell\myNode1\myAppServer6] ConnectedSetAdditional [myCell\myNode1\myAppServer3 myCell\myNode1\myAppServer4].
If you have these messages in your log file, complete the following steps.
About this task
The High Availability Manager logs DCSV8030I messages when the server fails to join the
view of another member of the core group due to connectivity issues. A server can join an existing
view only when it is connected to all current members of the view and not connected to servers
outside of the view. The DCSV8030I message contains two sets of servers: ConnectedSetAdditional and
ConnectedSetMissing. Use these sets to determine potential connectivity issues within the core
group.
Procedure
- Understand the four components of the DCSV8030I message:
- Failed server
- This server logs the DCSV8030I message and fails to join a view.
- Target server
- This server cannot join a view with the failed server.
- ConnectedSetAdditional
- The servers in this set are connected only to the target server.
- ConnectedSetMissing
- The servers in this set are connected only to the failed server.
- Determine which servers cannot connect to each other.
- Find servers that are listed in the ConnectedSetAdditional list.
The servers in this set are not connected to the failed server. If many servers are listed,
the failed server most likely has an issue.
- Find servers that are listed in the ConnectedSetMissing list.
The servers in this set are not connected to the target server. If many servers are listed,
the target server most likely has an issue.
- Find and resolve configuration issues.
- Find and resolve port conflicts.
- Check for port conflicts with the
DCS_UNICAST_ADDRESS
ports of the servers that
you found in step 2.
- Resolve any conflicts and ensure that ephemeral ports are not used.
- Save and sync any configuration changes.
- Restart all servers with configuration changes.
- Restart the deployment manager to sync the core group configuration.
- Find and resolve multiple Network Interface Controllers (NICs).
- If you use multiple NICs, specify IP addresses for the
DCS_UNICAST_ADDRESS
hosts of the servers that you found in step 2. Do not use host names or an asterisk (*).
- Save and sync configuration changes.
- Restart all servers with configuration changes.
- Restart the deployment manager to sync the core group configuration.
If configuration issues caused the connectivity failure, the failed servers
automatically establish new connections to each other, and the view stabilizes. If the view did not
stabilize, proceed to step 4.
- Find and resolve network issues. Check to make sure that firewalls do not prevent
incoming or outgoing connections on the
DCS_UNICAST_ADDRESS
IP addresses and ports
of the servers that you found in step 2.
Firewalls are often the main cause of network issues. Fixing the firewall can cause the
servers to automatically establish new connections. If the issue is not resolved, proceed to step
5.
- Find and resolve OutOfMemory errors.
OutOfMemory errors can cause unpredictable behavior within servers, such as failure to connect
to other core group members. If you find OutOfMemory errors, you can set a custom core group
property to isolate the problem server for future OutOfMemory errors and prevent view
instability.
- From the Admin console, go to .
- Enter IBM_CS_OOM_ACTION in the Name
field and enter Isolate in the Value
field.
- Save and sync the changes.
- Restart the servers.
If you find OutOfMemory errors within the core group, you might prevent future view
instability if you set the IBM_CS_OOM_ACTION custom property. If you do not find OutOfMemory errors,
proceed to step 6.
- Enable trace to find the exact Java™
ConnectException that causes the failed connection.
- Stop one of the servers with connectivity issues that you found in step 2.
- Enable the following trace specification on the server that you stopped:
HAManager=finest:DCS=finest:RMM=finest:TCPChannel=finest
- Restart the server.
- Search the server logs for the following string:
RMM Info: Failed to Establish
a new TCP connection to
- Verify that the IP and port of the target server in the message is one of the other
servers that you found in step 2.
- Use the Java ConnectException information from
the message to continue to debug more network issues in the environment.
Example
The following example shows a typical DCSV8030I message and the servers that are unable to
connect to each other:
RoleViewLeade I DCSV8030I: DCS Stack DefaultCoregroup at Member myCell\myNode1\myAppServer1: Failed to join or establish a view with member [myCell\myNode2\myAppServer2]. The reason is Not all candidates are connected ConnectedSetMissing= [myCell\myNode1\myAppServer5 myCell\myNode1\myAppServer6] ConnectedSetAdditional [myCell\myNode1\myAppServer3 myCell\myNode1\myAppServer4].
The following list shows the four components of the example DCSV8030I message:
- Failed server:
myCell\myNode1\myAppServer1
- Target server:
myCell\myNode2\myAppServer2
- ConnectedSetMissing:
myCell\myNode1\myAppServer5
and
myCell\myNode1\myAppServer6
- ConnectedSetAdditional:
myCell\myNode1\myAppServer3
and
myCell\myNode1\myAppServer4
The following list shows the servers that are unable to connect to each other: