IT03021: WMQ JAVA/JMS CLIENT CONNECTION TO QUEUE MANAGER APPEARS TO HANG WHEN QUEUE MANAGER IS NOT ABLE TO RESPOND
A fix is available
Closed as program error.
When a pkill -STOP command is issued against a queue manager and a WMQ V7.x Java/JMS client attempts to connect the connection does not return and no error is returned to the client. Trace shows repeated socket wait time-outs: 15:43:42.583.00 0031 @7ced7ced c.i.m.j.remote.impl.RemoteTCPConnection ----+----+----+- X receive(byte [ ],int,int) 15:43:42.583.00 0031 Read timed out [java.net.SocketTimeoutException] at: 15:43:42.583.00 0031 java.net.SocketInputStream.socketRead0(Native Method) followed by another wait on the socket: 15:43:42.583.02 0031 @7ced7ced c.i.m.j.remote.impl.RemoteTCPConnection ----+----+----+- d receive(byte [ ],int,int) Attempting to read from [java.net.SocketInputStream@72ee72ee] Connection attempts continue until the client is stopped or QMGR is resumed via pkill -CONT
**************************************************************** USERS AFFECTED: Users of the WebSphere MQ classes for Java/JMS who are making client mode (TCP/IP) connections to a queue manager. This includes users of the WebSphere MQ Resource Adapter. Platforms affected: AIX, HP-UX Itanium, IBM iSeries, Linux on Power, Linux on S390, Linux on x86, Linux on x86-64, Linux on zSeries, Solaris x86-64, Solaris SPARC, Windows, z/OS, MultiPlatform **************************************************************** PROBLEM DESCRIPTION: The default behaviour for the WebSphere MQ classes for Java/JMS v7.5 when establishing a TCP/IP connection with the queue manager is to wait indefinitely for the network layer to return a response - be it a success or failure response for establishing TCP/IP connection. For network environments where no response is returned, such as where the out-bound TCP packet is lost, this behaviour can be tuned using the JVM property: com.ibm.mq.cfg.TCP.Connect_Timeout which takes an integer value for the number of seconds to wait for the network layer to respond. Once the TCP/IP connection is established, the WebSphere MQ classes for Java/JMS negotiate a suitable heart-beat time interval, after which time the client and server will check that the other end is still available, or mark the connection as being disconnected. It was assumed that if the TCP/IP socket could be established with the queue manager's listener, when establishing the heart-beat time interval one of two things would happen: (1) The listener would respond to the client with the heart-beat negotiation communication. or (2) The listener would respond to the client indicating that the queue manager was not available. Therefore the client waited indefinitely for this initial logical communication with the queue manager after establishing the TCP/IP socket. However in unusual circumstances, the queue manager may not be able to respond. For example, it has been found on Linux that the queue manager processed could be paused, and Linux still permits TCP/IP sockets to be established with the paused process. This scenario can be achieved using the command: pkill -STOP -f where is the name of the queue manager. When this was done, the WebSphere MQ classes for Java/JMS would wait indefinitely for a response back from the queue manager when establishing a connection. If the queue manager process was unpaused using the command: pkill -CONT -f then the connection process would complete in the usual manner.
The WebSphere MQ classes for Java/JMS have been updated so that the tuning property: com.ibm.mq.cfg.MQRCVBLKTO determines how long the client will wait for a response from the queue manager after the TCP/IP socket has been established. The property takes an integer value, representing the number of seconds which the client will wait for, before marking the connection as broken and returning a MQRC 2009 (MQRC_CONNECTION_BROKEN) to the application requesting the connection. For example, to configure the client to wait a maximum of 30 seconds for the queue manager to respond when establishing the connection, you can use the Java argument: java -Dcom.ibm.mq.cfg.MQRCVBLKTO=30 where is the name of the class which was being run. Note that this tuning property will also override the heart-beat time interval value as determined from inspection of the channel's HBINT property. In the case where the TCP/IP socket creation does not return a response from the network, by default without the tuning property set the WebSphere MQ classes for Java/JMS will continue to wait indefinitely for the queue manager to return a response to the initial data flow. That is to say, without this property configured, the code change associated with this APAR has no effect on the behaviour of the WebSphere MQ classes for Java/JMS. In addition to specifying a specific integer value for the wait time, this tuning property can take the following operators: x Sets a heart-beat time as a multiplier of the HBINT value + Sets a heart-beat time as a value in addition to the HBINT value As both of these values require an initial communication with the queue manager to establish the HBINT value, for that initial communication, the wait time defaults to a value of 120 seconds (2 minutes). For example, if the HBINT value as set on the channel is the default 300 seconds, you could use the following syntax to set it to 600 seconds: -Dcom.ibm.mq.cfg.MQRCVBLKTO=x2 To set a heart-beat time of 305 seconds on a system with the default 300 second HBINT, you would you the syntax: -Dcom.ibm.mq.cfg.MQRCVBLKTO=+5 --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v7.0 22.214.171.124 v7.1 126.96.36.199 v7.5 188.8.131.52 v8.0 184.108.40.206 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Reported component name
WMQ BASE MULTIP
Reported component ID
Last modified date
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fixed component name
WMQ BASE MULTIP
Fixed component ID
Applicable component levels