A fix is available
APAR status
Closed as program error.
Error description
A WebSphere MQ messaging provider activation specification in WebSphere Application Server may hang after receiving a WorkRejectedException from the application server. When this occurs, the activation specification creates an ffdc file with the following information: FFDC Exception:java.lang.RuntimeException SourceId:com.ibm.ws.util.ThreadPool$Worker ProbeId:1624 Reporter:com.ibm.ws.util.ThreadPool$Worker@xxxxxxxx java.lang.RuntimeException: javax.resource.spi.work.WorkRejectedException: errorCode: 1 at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:331) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1659) Caused by: javax.resource.spi.work.WorkRejectedexception: errorCode: 1 at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:328) ... 1 more Some time after the error, WebSphere Application Server may report a message about a hanging thread, for example: "WMQJCAResourceAdapter : 43" (xxxxxxxx) has been active for 746655 milliseconds and may be hung. There is/are 2 thread(s) in total in the server that may be hung. at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:167) at com.ibm.mq.connector.inbound.ServerSessionImpl.close (ServerSessionImpl.java:298) at com.ibm.mq.connector.inbound.ServerSessionPoolImpl.closeInt ernal(ServerSessionPoolImpl.java:648)
Local fix
Uncheck the "Stop endpoint if message delivery fails" option on the WebSphere MQ activation specification, or change the activation specification's Connection Consumer "Start timeout" from its default of 10000 ms (10 seconds) to a much larger value, like 1000000 ms (1000 seconds).
Problem summary
**************************************************************** USERS AFFECTED: This issue affects users of: - The WebSphere MQ V7 Resource Adapter. - The WebSphere Application Server V7 WebSphere MQ messaging provider. - The WebSphere Application Server V8 WebSphere MQ messaging provider. who are using an Activation Specification to drive a Message Driven Bean (MDB) which consumes messages from a WebSphere MQ queue. Platforms affected: All Distributed (iSeries, all Unix and Windows) +Java +Java zOS **************************************************************** PROBLEM SUMMARY: When using the WebSphere MQ Resource Adapter (WMQ-RA) within a JEE environment, such as WebSphere Application Server, to drive an MDB using an Activation Specification, the WMQ-RA consumes messages from the WebSphere MQ queue using the following procedure: (1) Browse a message on the queue to find a suitable message (2) Obtain a ServerSession from the application server's ServerSession pool, or create a new one if the configured maximum number has not been reached. (3) Load the JMS Session contained within the ServerSession with the necessary information to be able to uniquely identify the message on the queue. (4) Queue the work with the application server's work manager to provide a thread and drive the MDB message consumption, followed by the MDB application code. In order to execute this piece of work after step (4), the application server requires resources which may not be immediately available. For example, the most common reason for a delay after step (4) is a lack of threads associated with the WMQ-RA, which in the WebSphere Application Server environment can be configured using the Administration Console path: Application servers -> [server] -> Thread pools -> WMQJCAResourceAdapter The "Maximum Size" value on this panel indicates the maximum number of threads which are available to the WMQ-RA for inbound communications - for example for driving MDB threads. This value defaults to 50 threads, which means that with the default ServerSession size of 10 per activation specification, all the threads could be exhausted with 5 activation specification applications (5 x 10 = 50 threads) running on the server simultaneously. Other resources which are required by the application server in order to start the queued MDB unit of work include: (a) Threads available to the application server (b) Threads available to the JVM, which may be limited by the operating system. (c) Available system CPU resource to drive the thread. Due to these resource requirements, the MDB unit of work may not begin immediately. By default, the WMQ-RA uses a 10 second timeout for this work to begin, configurable for the activation specification in the Administration Console using the path: Activation specifications -> [the actSpec] -> Advanced properties --> "Start timeout" in milliseconds defaulting to 10,000ms, or 10 seconds. If this time to start is exceeded, then when the resources are eventually allocated to run the MDB, the work will be immediately rejected. When the WebSphere Application Server rejects work in this fashion, it also produces a diagnostic FDC report to indicate that the rejection event has occurred, containing the opening data of the form: ---------------------------------------------------------------- [8/22/13 16:02:52:805 BST] FFDC Exception:java.lang.RuntimeE xception SourceId:com.ibm.ws.util.ThreadPool$Worker ProbeId:1624 Reporter:com.ibm.ws.util.ThreadPool$Worker@8dbb4cec java.lang.RuntimeException: javax.resource.spi.work.WorkRejectedException: errorCode: 1 at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:331) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1659) Caused by: javax.resource.spi.work.WorkRejectedException: errorC ode: 1 at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:328) ... 1 more ---------------------------------------------------------------- The JEE specification states that the errorCode value of '1' on the 'javax.resource.spi.work.WorkRejectedException' corresponds to the event: javax.resource.spi.work.WorkRejectedException.START_TIMED_OUT meaning that the expiration time for the work has been exceeded. The default configuration of an activation specification within the WMQ-RA is to pause the endpoint on the first failure to deliver a message, for which this WorkRejectedException event qualifies. In order to pause the endpoint, the activation specification must wait for all current activity (included queued up work) to complete, either by failure or to successfully finish. It was observed that when the activation specification attempted to pause, it ended up hanging with a Java thread with the following stack trace: (stack trace code line numbers are from WMQ-RA 7.0.1.7 running within the WSAS 8.0.0.4 environment) ---------------------------------------------------------------- java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:167) at com.ibm.mq.connector.inbound.ServerSessionImpl.close(Server SessionImpl.java:298) at com.ibm.mq.connector.inbound.ServerSessionPoolImpl.closeInt ernal(ServerSessionPoolImpl.java:648) at com.ibm.mq.connector.inbound.ServerSessionPoolImpl.close(Se rverSessionPoolImpl.java:557) at com.ibm.mq.connector.inbound.MessageEndpointDeployment.stop (MessageEndpointDeployment.java:473) at com.ibm.mq.connector.ResourceAdapterImpl.endpointDeactivati on(ResourceAdapterImpl.java:523) at com.ibm.ejs.j2c.ActivationSpecWrapperImpl.deactivateUnderRA ClassLoaderContext(ActivationSpecWrapperImpl.java:513) at com.ibm.ejs.j2c.ActivationSpecWrapperImpl.deactivateEndPoin t(ActivationSpecWrapperImpl.java:420) at com.ibm.ejs.j2c.mbeans.MessageEndpointMBeanImpl.pause(Messa geEndpointMBeanImpl.java:162) ---------------------------------------------------------------- After the default reporting time of 10 minute, WSAS will report in the SystemOut.log file that there is a potential hung thread within the environment which has been stuck for a period of time. After the pausing thread has got stuck in this manner, it will not be possible to cleanly shut down the application server, and it will be required to force the ending of the JVM process.
Problem conclusion
The problem of the pausing thread hanging was caused by the endpoint waiting for all the ServerSessions within the pool to be free, or to be returned to the pool - including the one which resulted in the WorkRejectedException. This meant that endpoint ended up waiting for a ServerSession to be returned to the pool, which was busy pausing the endpoint. The code has been changed to ensure that in this scenario, pausing the endpoint does not wait for work associated with the same ServerSession to complete before pausing. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v7.0 7.0.1.12 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IV47458
Reported component name
WMQ AIX V7
Reported component ID
5724H7221
Reported release
701
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-08-23
Closed date
2013-09-12
Last modified date
2013-09-12
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WMQ AIX V7
Fixed component ID
5724H7221
Applicable component levels
R701 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSDEZSF","label":"IBM WebSphere MQ Managed File Transfer for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.0.1","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
31 March 2023