TroubleShooting: Object Request Broker (ORB) problems

Troubleshooting

Problem

TroubleShooting for ORB problems with IBM WebSphere Application Server. This should help address common issues with this component before calling IBM support and save you time.

Resolving The Problem

< /div>

Troubleshooting ORB problems in IBM WebSphere Application Server. This page should help you address common issues with ORB before engaging IBM support which can save you time in resolving the issue.

Tab navigation

Troubleshooting topics:

Tab navigation

Overview

This topic discusses common ORB issues that can lead to deadlocks or hangs.

Topics Covered:

socketWrite Hangs

Blocked Reader Threads

CORBA.NO_RESPONSE

socketWrite Hangs

SocketWrite issues can occur on either the client or server ORB, during the writing of request data (client-side) or reply data (server-side). Since the socketWrite() method is a blocking call, it will only complete when the remote side completely reads the data from the incoming buffer. If for any reason the receiving side fails to read the data from the socket, the sending socketWrite thread can hang. In many cases, the problem is not confined to a single thread, but usually affects many/all of the ORB threads trying to write data to sockets.

Usually only one thread is stuck in socketWrite, and due to the nature of threads sharing a single ORB connection, all other ORB threads (WebContainer threads included) will be blocked in a slightly different stack waiting for access to the same socket. The following shows each of these stack conditions:

Client-side socketWrite stack example (SSL stack will include com.ibm.jsse2 calls):

3XMTHREADINFO "ORB.thread.pool : 1023" (TID:0x30554760, sys_thread_t:0x57323028, state:R, native ID:0x14C33) prio=5 at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103) at java.net.SocketOutputStream.write(SocketOutputStream.java:147) at com.ibm.rmi.util.buffer.SequentialByteBuffer.flushTo(SequentialByteBuffer.java:410) at com.ibm.rmi.util.buffer.SequentialByteBuffer.flushTo(SequentialByteBuffer.java:439) at com.ibm.rmi.iiop.IIOPOutputStream.writeTo(IIOPOutputStream.java:541) at com.ibm.rmi.iiop.Connection.write(Connection.java:2225) at com.ibm.rmi.iiop.Connection.send(Connection.java:2267) at com.ibm.rmi.iiop.ClientRequestImpl.invoke(ClientRequestImpl.java:338) at com.ibm.rmi.corba.ClientDelegate.invoke(ClientDelegate.java:424)

Client-side stack waiting to get into socketWrite:

3XMTHREADINFO "ORB.thread.pool : 1011" (TID:0x30543190, sys_thread_t:0x55E2E1A8, state:CW, native ID:0x14426) prio=5 4XESTACKTRACE at com.ibm.rmi.util.buffer.SequentialByteBuffer.flushTo(SequentialByteBuffer.java(Compiled Code)) 4XESTACKTRACE at com.ibm.rmi.iiop.IIOPOutputStream.writeTo(IIOPOutputStream.java(Compiled Code)) 4XESTACKTRACE at com.ibm.rmi.iiop.Connection.write(Compiled Code) 4XESTACKTRACE at com.ibm.rmi.iiop.Connection.send(Connection.java(Compiled Code)) 4XESTACKTRACE at com.ibm.rmi.iiop.ClientRequestImpl.invoke(ClientRequestImpl.java(Compiled Code)) 4XESTACKTRACE at com.ibm.rmi.corba.ClientDelegate.invoke(ClientDelegate.java(Compiled Code))

How to determine if a JVM is experiencing socketWrite issues:

1. Check SystemOut.log for hung thread messages (WSVR0605W) which will show socketWrite stacks.

2. Take a javacore/threaddump to examine ORB threads.

Possible Causes:

1. Network issues causing slow traffic or dropped packets.

2. OOM on the receiving side which kills the receiving reader or worker thread (in the case of reading fragments).

3. Server's ORB worker threads are all hung. This in turn causes its ReaderThreads to become hung waiting for an available worker thread, prohibiting the reading of the incoming request msg/fragments from the client. In this case, javacores are needed on the server to determine its threads' states and why they are hung.

Solution:

As seen above, socketWrite hangs can occur for any number of reasons. Obviously addressing the root problem (network, OOM, etc) can help prevent socketWrite issues, but the best measure is a preventative one using a number of ORB properties. The main property to set is com.ibm.CORBA.SocketWriteTimeout. This property will cause the writing thread to timeout, at which point a com.ibm.rmi.iiop.IIOPOutputStream$SocketWriteTimeOutException will be thrown, and the request will fail. Normal retry logic applies where the ORB will retry once by default.

Property: com.ibm.CORBA.SocketWriteTimeout
Settings:

Set in seconds
Default: 0
Set on client and/or server. In most cases, the problem is experienced on the client.
Recommended values. 2 options:

1. Set to a smaller value such as 5 or 10 seconds
If a server is truly hung and not slow, this allows client threads to clean up quickly. This approach however could terminate requests which aren't actually hung in socketWrite but are just taking a longer time than normal (due to socket contention, network slowdown, CPU spike, extra-long server process time, etc).

2. Set to a larger value such as 20 or 30 seconds
This is more forgiving of slow server response times, CPU spikes, etc. by not giving up too quickly on the socketWrites.

NOTE: The ORB com.ibm.CORBA.RequestTimeout timer will not start until the request message is successfully written to the server (ie socketWrite completes). In light of that, the SocketWriteTimeout (SWTO) property can be set to any value irrespective of the ORB RequestTimeout value.

Additional properties that can be set:

Property: com.ibm.CORBA.FragmentSize
Settings:

Default: 1024
Recommended values. 2 options:

1. Set to 0 (no fragmentation). This means the sending thread can write the entire message once it obtains the socketWrite lock.

2. Set to a larger fragmentSize, such as 10240, if the average message size is large (over 50k). This minimizes the number of times a particular thread has to obtain the socketWrite lock.

Set on all clients and/or servers.

Property: com.ibm.CORBA.ConnectionMultiplicity
Settings:

Default: 1
Recommended values: 5 – 10

This property designates how many connections a client will have to a given server. Increasing this number gives the client more connections (and hence more sockets) to each server which sending threads can share, thereby reducing contention over a single socket.

NOTE: Additional resources (more reader threads and sockets) will be required on both the client and server(s) so be cautious of setting too large a value.

Usually set on the client side (but can be set on a server).

Property: com.ibm.websphere.orb.threadPoolTimeout
Settings:

Set in ms
Default: 0
Recommended value: 10000

This property governs how long a server ReaderThread will wait for an available WorkerThread to handle the new incoming request. If there are NO free WorkerThreads, the ReaderThread will wait indefinitely by default until a free WorkerThread becomes available, which can lead to deadlocks or hangs. Setting this property >0 allows any particular ReaderThread to timeout after a certain period of time, which then allows the sending client side thread stuck in socketWrite to be released.

NOTE: Tuning this property is similar to SocketWriteTimeout in that there are 2 basic approaches: small or large timeout setting.

Set on the server

Example of hung ReaderThread:

3XMTHREADINFO "RT=51:P=946784:O=0:WSTCPTransportConnection[addr=1.2.3.4,port=47847,local=35293]" (TID:0x7000000000B21E0, sys_thread_t:0x11AF3D548, state:CW, native ID:0x27B3D) prio=5 4XESTACKTRACE at java.lang.Object.wait(Native Method) 4XESTACKTRACE at java.lang.Object.wait(Object.java(Compiled Code)) 4XESTACKTRACE at com.ibm.ws.util.BoundedBuffer.put(BoundedBuffer.java(Compiled Code)) 4XESTACKTRACE at com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code)) 4XESTACKTRACE at com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code)) 4XESTACKTRACE at com.ibm.ejs.oa.pool.ThreadPool.startWorkerThread(ThreadPool.java(Compiled Code)) 4XESTACKTRACE at com.ibm.rmi.iiop.Connection.processInput(Connection.java(Compiled Code)) 4XESTACKTRACE at com.ibm.rmi.iiop.Connection.doReaderWorkOnce(Connection.java(Compiled Code)) 4XESTACKTRACE at com.ibm.rmi.transport.ReaderThread.run(ReaderPoolImpl.java:137)

APARS needed:

IX90076 – FIX TO INCORPORATE SOCKET_WRITE_TIMEOUT IN ORB

Shipped in JDK60 SR10, and JDK626 and above

IX90112 - NULLPOINTEREXCEPTION IN SOCKETWRITETIMER THREAD

Shipped in JDK60 SR14, 626SR6, 70SR5
The NPE is logged in SystemErr.log, so this can be checked to confirm issue with the SocketTimer thread.

PM83349 - NullPointerException in SocketTimerThread due to improper initialization

Shipped in WAS 7.0.0.29, 8.0.0.7, 8.5.5.0

For more information on threads stuck in socketWrite see:
Hung thread behavior with SocketWriteTimerThread

Blocked Reader Threads

There are 3 basic kinds of ORB threads on the server side:

Listener Threads -- Accept new incoming ORB connections
Reader Threads -- Read incoming request or reply messages
Worker Threads – Execute the work of an incoming request

Blocked Readers are generally a server-only issue, but since clients can also handle incoming requests (like meta callbacks), blocked Readers can also occur on client JVMs. Here is the typical ORB request flow:

ORB Incoming Request Flow

Listener thread accepts a new incoming connection.
A new Reader thread is created for this connection.
Reader thread reads the incoming request message data. (Any subsequent requests on this connection will be read by this existing Reader thread.)
Reader thread passes the request to a free Worker thread to execute the request. << Blocked Readers occur here >>
Worker thread invokes the remote method called by the client and sends a reply message back. The Worker is now free to take on a new request.

Common Root Cause(s):

Ultimately Readers get blocked because of problems with the Worker threads. By default, the number of ORB Worker threads will not grow past the configured number of threads in the ORB.thread.pool. So, if there are NO free Worker threads (i.e. all the Workers are busy handling other requests), then a Reader will block waiting for a free Worker.

There are 2 basic types of problems which can lead to Worker threads hanging or taking longer than normal to complete (examples given below):

Operations being performed directly by the Workers
- Waiting to get a connection (e.g. database, webservice, etc)
- Database read/writes
- Secondary ORB calls to other servers
- ORB socketWrite hangs/delays
System environment constraints/problems
- CPU spikes
- Network issues
- Insufficient threads in the ORB.thread.pool to properly handle the server load
- Too much load on the server

In short, troubleshoot what is causing problems for the Worker threads. Since many of these Worker problems cannot be controlled at all times, the ORB property described below can be used to minimize the ill effects of those problems.

Symptoms:

A javacore or thread dump will show if there are Reader thread(s) in a blocked state. The com.ibm.ejs.oa.pool.ThreadPool.startWorkerThread() in the stack indicates this Reader (RT=51) is blocked waiting for a Worker

Server-side Reader blocked waiting for Worker stack example:

3XMTHREADINFO     "RT=51:P=946784:O=0:WSTCPTransportConnection[addr=1.2.3.4,port=47847,local=35293]" (TID:0x7000000000B21E0, sys_thread_t:0x11AF3D548, state:CW, native ID:0x27B3D) prio=5
4XESTACKTRACE          at java.lang.Object.wait(Native Method)
4XESTACKTRACE          at java.lang.Object.wait(Object.java(Compiled Code))
4XESTACKTRACE          at com.ibm.ws.util.BoundedBuffer.put(BoundedBuffer.java(Compiled Code))
4XESTACKTRACE          at com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code))
4XESTACKTRACE          at com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code))
4XESTACKTRACE          at com.ibm.ejs.oa.pool.ThreadPool.startWorkerThread(ThreadPool.java(Compiled Code))
4XESTACKTRACE          at com.ibm.rmi.iiop.Connection.processInput(Connection.java(Compiled Code))
4XESTACKTRACE          at com.ibm.rmi.iiop.Connection.doReaderWorkOnce(Connection.java(Compiled Code))
4XESTACKTRACE          at com.ibm.rmi.transport.ReaderThread.run(ReaderPoolImpl.java:137)

Likewise, the remote side (client) will likely show threads stuck in socketWrite():

Client-side socketWrite stack example:

3XMTHREADINFO      "ORB.thread.pool : 1023" (TID:0x30554760, sys_thread_t:0x57323028, state:R, native ID:0x14C33) prio=5
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
at java.net.SocketOutputStream.write(SocketOutputStream.java:147)
at com.ibm.rmi.util.buffer.SequentialByteBuffer.flushTo(SequentialByteBuffer.java:410)
at com.ibm.rmi.util.buffer.SequentialByteBuffer.flushTo(SequentialByteBuffer.java:439)
at com.ibm.rmi.iiop.IIOPOutputStream.writeTo(IIOPOutputStream.java:541)
at com.ibm.rmi.iiop.Connection.write(Connection.java:2225)
at com.ibm.rmi.iiop.Connection.send(Connection.java:2267)
at com.ibm.rmi.iiop.ClientRequestImpl.invoke(ClientRequestImpl.java:338)
at com.ibm.rmi.corba.ClientDelegate.invoke(ClientDelegate.java:424)

Solution:

The property com.ibm.websphere.orb.threadPoolTimeout controls how long a Reader will wait for a free Worker. Basic facts:

This property will only help clear up a blocked Reader. It will not directly address issues with the Workers.
Set in ms. Default is 0, which means the Reader will never time out.
Recommended setting considerations. Choose a setting that is:
- Not too small, where Readers timeout too soon, causing unnecessary failures/retries.
- Not too large, where Readers (and remote sending threads) have to wait for an inordinate amount of time before giving up.
- Loosely based on the average server request process time. e.g. if the server usually takes 1sec during heavy load to process a typical request, then approximately every 1 sec a Worker should be free to handle a new request as long as everything is running smoothly. So a timeout of 10000 ms would be a good starting value.

After the threadPoolTimeout period, the Reader will throw a RuntimeException and the following message will be printed to the server SystemOut.log:

WSVR0627W: A thread could not be acquired from the ORB thread pool after the maximum wait time of {0} milliseconds was exceeded.

Explanation:  The ORB was unable to acquire a thread to process a request.

The client will receive an IOException.
If the client retries the failed request, the Reader may simply become blocked again until the Worker threads have completed.
For more information on this property, review the Knowledge Center article Object Request Broker custom properties

CORBA.NO_RESPONSE

When a client sends an ORB request message to a server, it will wait for a specified time period (ORB RequestTimeout) for a reply message to be received from the server. If no such reply message is received within that time period, the client ORB will throw a CORBA.NO_RESPONSE exception.

Symptoms

Typical stack trace found on the client in either ORB trace or SystemOut:

[10/18/17 18:48:35:090 PDT] 00000001 ORBRas        3 com.ibm.rmi.iiop.Connection getCallStream:2436 P=534912:O=0:CT The following exception was logged
                                 org.omg.CORBA.NO_RESPONSE: Request 6 timed out  vmcid: IBM  minor code: B01 completed: Maybe
        at com.ibm.rmi.iiop.Connection.getCallStream(Connection.java:2423)
        at com.ibm.rmi.iiop.Connection.send(Connection.java:2350)
        at com/ibm/rmi/iiop/Connection._locate(Connection.java:453(Compiled Code))
        at com/ibm/rmi/iiop/Connection.locate(Connection.java:429(Compiled Code))
        at com/ibm/rmi/iiop/GIOPImpl.locate(GIOPImpl.java:205(Compiled Code))
        at com/ibm/rmi/corba/ClientDelegate.locate(ClientDelegate.java:1966(Compiled Code))
        at com/ibm/rmi/corba/ClientDelegate._createRequest(ClientDelegate.java:1991(Compiled Code))
        at com/ibm/rmi/corba/ClientDelegate.createRequest(ClientDelegate.java:1155(Compiled Code))
        ...

When the client is going through WLM, the following exception will also be thrown at the WLM layer:

org.omg.CORBA.TRANSIENT: SIGNAL_RETRY vmcid: 0x49421000 minor code: 42

Common Causes

In almost all cases, the root cause is due to a server-side issue, categorized below:

The server thread handling a request is delayed due to the following types of operations:
- Waiting to get a connection (e.g. database, webservice, etc)
- Database read/writes
- Secondary ORB calls to other servers
- ORB socketWrite hangs/delays
- Hostname resolution calls which can be delayed by DNS server issues
System environment constraints/problems
- CPU spikes
- Network issues
- Insufficient threads in the ORB.thread.pool to properly handle the server load
- Too much load on the server (insufficient resources)
Another process has hijacked the WAS Server’s ORB port (see Port Hijacking section)

Diagnostic Data

Collect the following data for initial analysis:

ORB trace on the client side. This will identify which server(s) are causing the NO_RESPONSE’s.
Server ORB trace, only if port hijacking is suspected.

Server-side SystemOut.logs. These may contain hung-thread messages (which include a stack trace) which can help identify the root cause.

   WSVR0605W: Thread “ORB.thread.pool : 1” has been active for 612,000 milliseconds and may be hung. There are 3 threads in total in the server that may be hung.

Server-side Javacores or thread dumps at the time of the problem.
- Collect 3 javacores/thread dumps ~30 seconds apart.
- Examine the ORB.thread.pool threads for bottleneck or delay points (e.g. waiting for a DB connection) to further narrow down the root cause.

If network and CPU issues are suspected, the following data will also be necessary:

Network trace must be collected on both the client and server sides to determine network latency and other possible network problems.
CPU performance data. Use the following MustGather links for instructions:
- AIX
- Linux
- Windows
- Solaris

Solution

After collecting the necessary diagnostic information, review that data in light of the “Common Causes” section to see if one of those scenarios matches the diagnostics, then correct the underlying issue.

Additional Notes:

NO_RESPONSE exceptions on the client are NOT automatically retried by the ORB. The client application needs to determine if the request can be safely retried or not.
NO_RESPONSE exceptions cause the WLM client to mark servers as unavailable, and in some cases, ALL servers can be removed from WLM’s selection pool, resulting in a CORBA.NO_IMPLEMENT NoAvailableTargets exception (see Troubleshooting: Workload Management Problems for more information)

[{"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Object Request Broker (ORB)","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF012","label":"IBM i"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.0;8.5.5;8.5;8.0;7.0","Edition":"Base;Express;Network Deployment","Line of Business":{"code":"LOB45","label":"Automation"}}]

Tips

TroubleShooting: Object Request Broker (ORB) problems