Troubleshooting client exceptions

[Version 8.6.0.5 and later]You can troubleshoot exceptions that occur when a client attempts to complete a transaction.

Procedure

  • Debug TargetNotAvailable exceptions.
    Problem:
    TargetNotAvailableException exceptions occur on the client. A TargetNotAvailableException exception occurs when a client tries to run a transaction against a data grid and fails. For example, ClientA wants to insert key29 into the data grid, MyGrid. The client code hashes key29 to partition 1 of MyGrid, and attempts to route to it. Partition 1 of MyGrid cannot be successfully accessed. ClientA receives a TargetNotAvailableException exception.
    Solution
    Complete the following steps to debug the TargetNotAvailableException exception.
    1. Receiving a TargetNotAvailableException exception can indicate a normal failover condition. For example, during a container server failure, there is a window of time when the failure is detected and shards are promoted and replaced.
    2. A TargetNotAvailableException exception is an umbrella exception for many situations. To debug a TargetNotAvailableException exception, you must review the exception message and chained exceptions. Identify the following parts of a TargetNotAvailableException exception:
      • The Shard identification string (shard ID). The client routes a transaction to a shard, either the primary or replica shard. A shard ID consists of the domain name, the ObjectGrid name, the map set name, and the partition number. The format is domainName:ObjectGridName:MapSetName:Partition. The shard ID can be used when you review the xscmd command output, such as routetable, showPlacement, showMapSizes, and other commands. It can also be used to identify messages in the catalog and container server SystemOut or FFDC logs.
      • The exception message. This message can be another exception, such as an ORB or an XIO transport exception.
      • The chained or cause-by exceptions, which can be an ORB or an XIO exception, an exception that is thrown by WebSphere® eXtreme Scale, or it can be a Java IO exception.
    3. If the client-side log is available, review the log for CWOBJ113 series messages with the same shard ID as the TargetNotAvailableException exception. For example, a CWOBJ1130 message shows the first exception that is encountered by the client. If the request recovers after the request is resent, a CWOBJ1133 message follows. If the request is made again and fails, then a TargetNotAvailableException exception occurs.
    Problem
    Exception org.omg.CORBA.NO_RESPONSE (ORB)

    com.ibm.ws.xsspi.xio.exception.MessageTimeOutException (XIO)

    Solution
    These messages indicate that the transport layer did not determine whether a successful connection was made, and a response did not occur within the configured timeout. Answer the following questions to debug this problem:
    • Are there any network problems that prevent connections? For example, the network is intermittently down; the firewall blocks ports; the DNS service has intermittent problems.
    • Was there a hard failure such as an entire system failure? If the TCP layer did not detect the loss of a system, a NO_RESPONSE or MessageTimeOutException exception can occur.
    • Are there CWOBJ7853W messages that indicate hung threads? Review the log of the container server for which the following actions occurred:
      • Where the CWOBJ1130 message was displayed.
      • Where the xscmd routetable command was run.
    • Do transaction timeouts indicate long running or failing transactions?
    • Is the ORB RequestTimeout or XIO xioTimeout set too low? For example, a time out of 5 seconds or less increases the chances of NO_RESPONSE or MessageTimeOutException especially during server stop or start events.
  • Use the following solutions to troubleshoot the exceptions that are found in the TargetNotAvailableException, which occur when the client cannot connect to the data grid:
    Table 1. Client exceptions and solutions
    Exception Solution
    com.ibm.ws.xsspi.xio.exception.ConnectionRefusedException (XIO)
    org.omg.CORBA.TRANSIENT (ORB)
    org.omg.CORBA.COMM_FAILURE (ORB)
    These messages indicate that the container server was not contacted, and the JVM process is gone. This exception is normally temporary and the remote primary shard fails over to a new location, and the clients are updated. If the clients do not recover, check whether either domain has quorum enabled and whether the system is out of quorum. Use the xscmd command, showQuorumStatus. If the domain is out of quorum, placement changes are not done.

    If quorum is not the issue, review the xscmd showPlacement and routetable output for the shard IDs that are getting TargetNotAvailableExceptions exceptions. If the primary shard is not placed or marked as not reachable in the routetable output, use the xscmd triggerPlacement command.

    org.omg.CORBA.OBJECT_NOT_EXIST (ORB)
    com.ibm.ws.xsspi.xio.exception.ActorNotFoundException (XIO)
    com.ibm.ws.xsspi.xio.exception.InvalidXIORefException (XIO)
    
    
    These messages indicate that the container server was contacted, but the primary shard was not found. This exception is normally temporary, the primary shard moves to a new location, and the clients are updated. If the clients do not recover, check whether either domain has quorum enabled and whether the system is out of quorum. Use the xscmd command showQuorumStatus. If the domain is out of quorum, placement changes are not done. If quorum is not an issue, review the xscmd showPlacement and routetable output for the shard IDs getting TargetNotAvailableException exceptions. If the primary shard is not placed or the primary shard is marked as not reachable in the routetable output, use the xscmd triggerPlacement command.
    Cannot find primary for domain1:grid:mapSet:0 (ORB or XIO)
    This message indicates that the client did not find route information for the shard ID. Check whether either domain has quorum enabled and whether the system is out of quorum. Use the xscmd command showQuorumStatus. If the domain is out of quorum, placement changes are not done. If quorum is not at issue, review the xscmd showPlacement and routetable output for the shard IDs that are producing the TargetNotAvailableException exceptions. If the primary shard is not placed, check whether placement is suspended using the xscmd suspendStatus command. If the ObjectGrid is suspended, use the resumeBalancing command. If the primary shard is marked as not reachable then use the xscmd triggerPlacement command.
    org.omg.CORBA.TRANSIENT: Partition 1 for grid in map set mapSet is temporarily not taking new transactions on server0_C-0  
    vmcid: 0x49424000  minor code: D84  completed: No
    com.ibm.ws.xsspi.xio.exception.TransportException$Transient:
    Partition 1 for grid in map set mapSet is temporarily not taking new transactions on server0_C-0
    
    This message indicates that the client reached the shard, but the shard is in a transitional state, and cannot serve the request. The shard did not complete work to promote to the primary shard. A shard might be experiencing a demotion or the shard might be in the process of stopping. Review the xscmd showPlacement and routetable output for the shard IDs that are producing the TargetNotAvailableException exceptions. Also, review the log for the container server that is listed in the exception.