IV47458: A WEBSPHERE MQ MESSAGING PROVIDER ACTIVATION SPECIFICATION IN WSAS HANGS AFTER A JAVAX.RESOURCE.SPI.WORK.WORKREJECTEDEXCEPTION

A fix is available

APAR status

Closed as program error.

Error description

A WebSphere MQ messaging provider activation specification in
WebSphere Application Server may hang after receiving a
WorkRejectedException from the application server.  When this
occurs, the activation specification creates an ffdc file with
the following information:

FFDC Exception:java.lang.RuntimeException
SourceId:com.ibm.ws.util.ThreadPool$Worker
ProbeId:1624
Reporter:com.ibm.ws.util.ThreadPool$Worker@xxxxxxxx
java.lang.RuntimeException:
javax.resource.spi.work.WorkRejectedException: errorCode: 1
at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:331)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1659)
Caused by: javax.resource.spi.work.WorkRejectedexception:
errorCode: 1
at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:328)
... 1 more


Some time after the error, WebSphere Application Server may
report a message about a hanging thread, for example:

"WMQJCAResourceAdapter : 43" (xxxxxxxx) has been active for
746655 milliseconds and may be hung.  There is/are 2 thread(s)
in total in the server that may be hung.
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:167)
at com.ibm.mq.connector.inbound.ServerSessionImpl.close
(ServerSessionImpl.java:298)
at com.ibm.mq.connector.inbound.ServerSessionPoolImpl.closeInt
ernal(ServerSessionPoolImpl.java:648)

Local fix

Uncheck the "Stop endpoint if message delivery fails" option on
the WebSphere MQ activation specification, or change the
activation specification's Connection Consumer "Start timeout"
from its default of 10000 ms (10 seconds) to a much larger
value, like 1000000 ms (1000 seconds).

Problem summary

****************************************************************
USERS AFFECTED:
This issue affects users of:

- The WebSphere MQ V7 Resource Adapter.
- The WebSphere Application Server V7 WebSphere MQ messaging
provider.
- The WebSphere Application Server V8 WebSphere MQ messaging
provider.

who are using an Activation Specification to drive a Message
Driven Bean (MDB) which
consumes messages from a WebSphere MQ queue.

Platforms affected:
All Distributed (iSeries, all Unix and Windows) +Java +Java zOS
****************************************************************
PROBLEM SUMMARY:
When using the WebSphere MQ Resource Adapter (WMQ-RA) within a
JEE environment, such as WebSphere Application Server, to
drive an MDB using an Activation Specification, the WMQ-RA
consumes messages from the WebSphere MQ queue using the
following procedure:

(1) Browse a message on the queue to find a suitable message

(2) Obtain a ServerSession from the application server's
ServerSession pool, or create a new one if the configured
maximum number has not been reached.

(3) Load the JMS Session contained within the ServerSession with
the necessary information to be able to uniquely identify
the message on the queue.

(4) Queue the work with the application server's work manager
to provide a
thread and drive the MDB message consumption, followed by
the MDB application code.

In order to execute this piece of work after step (4), the
application server requires resources which may not be
immediately available.

For example, the most common reason for a delay after step (4)
is a lack of threads associated with the WMQ-RA, which in the
WebSphere Application Server environment can be configured
using the Administration Console path:

Application servers -> [server] -> Thread pools ->
WMQJCAResourceAdapter

The "Maximum Size" value on this panel indicates the maximum
number of threads which are available to the WMQ-RA for inbound
communications - for example for driving MDB threads.

This value defaults to 50 threads, which means that with the
default ServerSession size of 10 per activation specification,
all the threads could be exhausted with 5 activation
specification applications (5 x 10 = 50 threads) running on the
server simultaneously.


Other resources which are required by the application server in
order to start the queued MDB unit of work include:

(a) Threads available to the application server
(b) Threads available to the JVM, which may be limited by the
operating system.
(c) Available system CPU resource to drive the thread.


Due to these resource requirements, the MDB unit of work may not
begin immediately.  By default, the WMQ-RA uses a 10 second
timeout for this work to begin, configurable for the activation
specification in the Administration Console using the path:

Activation specifications -> [the actSpec] ->
Advanced properties --> "Start timeout" in milliseconds

defaulting to 10,000ms, or 10 seconds.


If this time to start is exceeded, then when the resources are
eventually allocated to run the MDB, the work will be
immediately rejected.  When the WebSphere Application Server
rejects work in this fashion, it also produces a diagnostic FDC
report to
indicate that the rejection event has occurred, containing the
opening data of the form:

----------------------------------------------------------------
[8/22/13 16:02:52:805 BST]     FFDC Exception:java.lang.RuntimeE
xception SourceId:com.ibm.ws.util.ThreadPool$Worker ProbeId:1624
Reporter:com.ibm.ws.util.ThreadPool$Worker@8dbb4cec
java.lang.RuntimeException:
javax.resource.spi.work.WorkRejectedException: errorCode: 1
at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:331)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1659)
Caused by: javax.resource.spi.work.WorkRejectedException: errorC
ode: 1
at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:328)
... 1 more
----------------------------------------------------------------

The JEE specification states that the errorCode value of '1' on
the 'javax.resource.spi.work.WorkRejectedException'
corresponds to the event:

javax.resource.spi.work.WorkRejectedException.START_TIMED_OUT

meaning that the expiration time for the work has been exceeded.


The default configuration of an activation specification within
the WMQ-RA is to pause the endpoint on the first failure to
deliver a message, for which this WorkRejectedException event
qualifies.

In order to pause the endpoint, the activation specification
must wait for all current activity (included queued up work) to
complete, either by failure or to successfully finish.



It was observed that when the activation specification attempted
to pause, it ended up hanging with a Java thread with the
following stack trace:

(stack trace code line numbers are from WMQ-RA 7.0.1.7 running
within the WSAS 8.0.0.4 environment)

----------------------------------------------------------------
java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:167)
at com.ibm.mq.connector.inbound.ServerSessionImpl.close(Server
SessionImpl.java:298)
at com.ibm.mq.connector.inbound.ServerSessionPoolImpl.closeInt
ernal(ServerSessionPoolImpl.java:648)
at com.ibm.mq.connector.inbound.ServerSessionPoolImpl.close(Se
rverSessionPoolImpl.java:557)
at com.ibm.mq.connector.inbound.MessageEndpointDeployment.stop
(MessageEndpointDeployment.java:473)
at com.ibm.mq.connector.ResourceAdapterImpl.endpointDeactivati
on(ResourceAdapterImpl.java:523)
at com.ibm.ejs.j2c.ActivationSpecWrapperImpl.deactivateUnderRA
ClassLoaderContext(ActivationSpecWrapperImpl.java:513)
at com.ibm.ejs.j2c.ActivationSpecWrapperImpl.deactivateEndPoin
t(ActivationSpecWrapperImpl.java:420)
at com.ibm.ejs.j2c.mbeans.MessageEndpointMBeanImpl.pause(Messa
geEndpointMBeanImpl.java:162)
----------------------------------------------------------------

After the default reporting time of 10 minute, WSAS will report
in the SystemOut.log file that there is a potential hung thread
within the environment which has been stuck for a period of
time.

After the pausing thread has got stuck in this manner, it will
not be possible to cleanly shut down the application server,
and it will be required to force the ending of the JVM process.

Problem conclusion

The problem of the pausing thread hanging was caused by the
endpoint waiting for all the ServerSessions within the pool to
be free, or to be returned to the pool - including the one which
resulted in the WorkRejectedException.

This meant that endpoint ended up waiting for a ServerSession to
be returned to the pool, which was busy pausing the endpoint.

The code has been changed to ensure that in this scenario,
pausing the endpoint does not wait for work associated with the
same ServerSession to complete before pausing.

---------------------------------------------------------------
The fix is targeted for delivery in the following PTFs:

Version    Maintenance Level
v7.0       7.0.1.12

The latest available maintenance can be obtained from
'WebSphere MQ Recommended Fixes'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037

If the maintenance level is not yet available information on
its planned availability can be found in 'WebSphere MQ
Planned Maintenance Release Dates'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
---------------------------------------------------------------

Temporary fix

Comments

APAR Information

APAR number
IV47458
Reported component name
WMQ AIX V7
Reported component ID
5724H7221
Reported release
701
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-08-23
Closed date
2013-09-12
Last modified date
2013-09-12

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WMQ AIX V7
Fixed component ID
5724H7221

Applicable component levels

R701 PSY
UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSDEZSF","label":"IBM WebSphere MQ Managed File Transfer for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.0.1","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
31 March 2023

Tips

IV47458: A WEBSPHERE MQ MESSAGING PROVIDER ACTIVATION SPECIFICATION IN WSAS HANGS AFTER A JAVAX.RESOURCE.SPI.WORK.WORKREJECTEDEXCEPTION

A fix is available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R701 PSY

Document Information

Share your feedback

Need support?