Troubleshooting server start caused by the CWRLS0030W message

Resolve connectivity-related failures to troubleshoot failed server start attempts. Connectivity failures due to network or configuration issues are the most common cause of a CWRLS0030W transaction message. Connectivity failures can prevent a server from joining a view and prevent the server from starting.

Before you begin

Check for the following CWRLS0030W message in the log file:

RLSHAGroupCal W CWRLS0030W: Waiting for HAManager to activate recovery processing for local WebSphere server

If you find the CWRLS0030W message, a DCSV8030I message is most likely present either before or after the CWRLS0030W message:

RoleViewLeade I DCSV8030I: DCS Stack DefaultCoregroup at Member myCell\myNode1\myAppServer1: Failed  to join or establish a view with member [myCell\myNode2\myAppServer2]. The reason is Not all candidates are connected ConnectedSetMissing= [myCell\myNode1\myAppServer5 myCell\myNode1\myAppServer6] ConnectedSetAdditional [myCell\myNode1\myAppServer3 myCell\myNode1\myAppServer4].

If you have these messages in your log file, complete the following steps.

About this task

The High Availability Manager logs DCSV8030I messages when the server fails to join the view of another member of the core group due to connectivity issues. A server can join an existing view only when it is connected to all current members of the view and not connected to servers outside of the view. The DCSV8030I message contains two sets of servers: ConnectedSetAdditional and ConnectedSetMissing. Use these sets to determine potential connectivity issues within the core group.

Procedure

  1. Understand the four components of the DCSV8030I message:
    Failed server
    This server logs the DCSV8030I message and fails to join a view.
    Target server
    This server cannot join a view with the failed server.
    ConnectedSetAdditional
    The servers in this set are connected only to the target server.
    ConnectedSetMissing
    The servers in this set are connected only to the failed server.
  2. Determine which servers cannot connect to each other.
    1. Find servers that are listed in the ConnectedSetAdditional list.
      The servers in this set are not connected to the failed server. If many servers are listed, the failed server most likely has an issue.
    2. Find servers that are listed in the ConnectedSetMissing list.
      The servers in this set are not connected to the target server. If many servers are listed, the target server most likely has an issue.
  3. Find and resolve configuration issues.
    • Find and resolve port conflicts.
      1. Check for port conflicts with the DCS_UNICAST_ADDRESS ports of the servers that you found in step 2.
      2. Resolve any conflicts and ensure that ephemeral ports are not used.
      3. Save and sync any configuration changes.
      4. Restart all servers with configuration changes.
      5. Restart the deployment manager to sync the core group configuration.
    • Find and resolve multiple Network Interface Controllers (NICs).
      1. If you use multiple NICs, specify IP addresses for the DCS_UNICAST_ADDRESS hosts of the servers that you found in step 2. Do not use host names or an asterisk (*).
      2. Save and sync configuration changes.
      3. Restart all servers with configuration changes.
      4. Restart the deployment manager to sync the core group configuration.
    If configuration issues caused the connectivity failure, the failed servers automatically establish new connections to each other, and the view stabilizes. If the view did not stabilize, proceed to step 4.
  4. Find and resolve network issues. Check to make sure that firewalls do not prevent incoming or outgoing connections on the DCS_UNICAST_ADDRESS IP addresses and ports of the servers that you found in step 2.
    Firewalls are often the main cause of network issues. Fixing the firewall can cause the servers to automatically establish new connections. If the issue is not resolved, proceed to step 5.
  5. Find and resolve OutOfMemory errors.
    OutOfMemory errors can cause unpredictable behavior within servers, such as failure to connect to other core group members. If you find OutOfMemory errors, you can set a custom core group property to isolate the problem server for future OutOfMemory errors and prevent view instability.
    1. From the Admin console, go to Servers > Core Groups > Core group settings > CORE_GROUP_NAME > Custom properties > New.
    2. Enter IBM_CS_OOM_ACTION in the Name field and enter Isolate in the Value field.
    3. Save and sync the changes.
    4. Restart the servers.
    If you find OutOfMemory errors within the core group, you might prevent future view instability if you set the IBM_CS_OOM_ACTION custom property. If you do not find OutOfMemory errors, proceed to step 6.
  6. Enable trace to find the exact Java™ ConnectException that causes the failed connection.
    1. Stop one of the servers with connectivity issues that you found in step 2.
    2. Enable the following trace specification on the server that you stopped: HAManager=finest:DCS=finest:RMM=finest:TCPChannel=finest
    3. Restart the server.
    4. Search the server logs for the following string: RMM Info: Failed to Establish a new TCP connection to
    5. Verify that the IP and port of the target server in the message is one of the other servers that you found in step 2.
    6. Use the Java ConnectException information from the message to continue to debug more network issues in the environment.

Example

The following example shows a typical DCSV8030I message and the servers that are unable to connect to each other:

RoleViewLeade I DCSV8030I: DCS Stack DefaultCoregroup at Member myCell\myNode1\myAppServer1: Failed to join or establish a view with member [myCell\myNode2\myAppServer2]. The reason is Not all candidates are connected ConnectedSetMissing= [myCell\myNode1\myAppServer5 myCell\myNode1\myAppServer6] ConnectedSetAdditional [myCell\myNode1\myAppServer3 myCell\myNode1\myAppServer4].

The following list shows the four components of the example DCSV8030I message:

  • Failed server: myCell\myNode1\myAppServer1
  • Target server: myCell\myNode2\myAppServer2
  • ConnectedSetMissing: myCell\myNode1\myAppServer5 and myCell\myNode1\myAppServer6
  • ConnectedSetAdditional: myCell\myNode1\myAppServer3 and myCell\myNode1\myAppServer4

The following list shows the servers that are unable to connect to each other:

  • The failed myCell\myNode1\myAppServer1 server cannot connect to the myCell\myNode1\myAppServer3 and myCell\myNode1\myAppServer4 servers.
  • The target myCell\myNode2\myAppServer2 server cannot connect to the myCell\myNode1\myAppServer5 and myCell\myNode1\myAppServer6 servers.