IBM Support

KAINT (keepalive) for an orphaned client channel in IBM MQ for z/OS

Troubleshooting


Problem

You observe a SVRCONN client channel with a STATUS of RUNNING, or RUN on the CSQOREXX panels, even though the network connection to the client stopped. If there are uncommitted gets or puts, after 2 log archives you will see the following message:
 CSQJ160I CSQ1 CSQJLRUR LONG-RUNNING UOW FOUND, URID=urid CONNECTION NAME=CSQ1CHIN
or after 3 checkpoints:
 
 CSQR026I LONG-RUNNING UOW SHUNTED TO RBA=rba, URID=urid CONNECTION NAME=CSQ1CHIN
The client application is no longer available, but the client channel is in a TCP/IP RECEIVE state according to the SUBSTATE of a DISPLAY CHSTATUS command.  The substate information is also available in the channel status information in a dump of the MSTR and CHIN.

Symptom

Other symptoms might include one of these error messages in the CHIN log:

CSQX489E Maximum instance limit limit exceeded, channel channel-name connection conn-id 
CSQX490E Maximum client instance limit limit exceeded, channel channel-name connection conn-id
CSQX513E Current channel limit exceeded, channel channel-name connection conn-id 
CSQX573E Channel channel-name exceeded active channel limit

Cause

TCP/IP did not notify the channel initiator of the broken connection.

Resolving The Problem

  • To back out the unit of work:

Stop the SVRCONN channel with MODE(FORCE).

WARNING: While log shunting normally allows the backout to occur with information available in the archive logs, be aware of whether shunting failed with CSQR027I in the MSTR job log. If log shunting failed, archive logs as far back as the beginning of the UOW might be needed to avoid an abend during backout of persistent messages. If the archive logs are not available, you need to recycle the queue manager, commit the messages when the CSQR021D is issued, and then manually recover from the partially processed UOW.
 

  • To avoid the need for manual intervention:
    • Activate TCP/IP Keepalive and Receive Timeout for the queue manager, for example:

      ALTER QMGR TCPKEEP(YES) RCVTTYPE(ADD) RCVTIME(60)

      TCPKEEP is needed for channels that use SHARECNV(0).

      For SVRCONN channels, RCVTIME applies to channels that use a nonzero value for SHARECNV because these channels use 2-way heartbeating.  The parameters in the example add 60 seconds to the "negotiated HBINT + 5 seconds" for the SVRCONN channel: 60 [HBINT] + 5 + 60 [RCVTIME] = 125 seconds.  You can also set receive timeout based on a multiple of HBINT or by using a specific integer.

      For non-SVRCONN channels, the parameters in the example add 60 seconds to the negotiated HBINT.

    • If you want another value for the Heartbeat Interval for the SVRCONN, alter the value, which has a default of HBINT(300). For example:

           ALTER CHANNEL(channel_name) CHLTYPE(SVRCONN) HBINT(120)

    • If you are using TCPKEEP(YES), confirm the setting of the Keepalive Interval (KAINT) parameter for the SVRCONN definition:

      - The default value of KAINT(AUTO) is appropriate in most cases. KAINT resolves to the negotiated HBINT+60 seconds if HBINT is nonzero and to zero if HBINT is zero.

      - It is also possible to specify an integer for KAINT. For example, if HBINT is the default of 300, set a higher value for KAINT:

           ALTER CHANNEL(channel_name) CHLTYPE(SVRCONN) KAINT(360)

      - For a DataPower client, specify a Cache Timeout value in the DataPower configuration that is greater than the negotiated heartbeat interval but less than the keep alive interval.
       
  • A nonzero value for DISCINT might be used to time out idle channels that are not in the middle of an MQ API call. Care must be taken not to disconnect clients unless they are capable of handling the disconnect appropriately.

    Use DISCINT(0) (zero) for SVRCONN channels used by JMS clients including WebSphere Application Server (WAS or WSAS). Similarly, use DISCINT(0) for channels used by managed .NET (dotnet) clients. This DISCINT setting avoids unexpected errors of AMQ9208 or MQJE001 with reason 2009 MQRC_CONNECTION_BROKEN on the client side.  The corresponding error in the CHIN job log is CSQX259E.  Note that CSQX259E can have more than one root cause.  One cause is that the MQI channel (client channel) has a nonzero DISCINT value, which is fine in some cases but causes unexpected errors when connection pooling is being used.

    A non-zero value for SHARECNV, which is discussed later in this document, can help end channels when there is a break in the socket.

    Be aware of another situation for reason 2009 in the WSAS environment: IBM MQ for z/OS 9.2.0 users should apply the fix for APAR PH41602 if a WSAS application uses an activation specification, with authentication, to monitor an MQ queue in a Queue Sharing Group (QSG), if the QSG name is used for the binding connection name rather than a queue manager name.  Without the fix, RC=2009 MQRC_CONNECTION_BROKEN might be seen on redeployment or activation specification restart.      

A keep-alive timeout results in the following messages:

+CSQX208E +CSQ1 CSQXRESP Error receiving data,
channel PC1.TO.CSQ1.SVRCONN,
connection <ipname> (<ip addr>)
(queue manager ????)
TRPTYPE=TCP RC=00000461
+CSQX599E +CSQ1 CSQXRESP Channel PC1.TO.CSQ1.SVRCONN ended abnormally



The channel timeout rolls back the messages. This means that the messages put as part of a unit of work are deleted, and messages retrieved as part of a unit of work are reinstated on the queue.

Additional history of channel timeout functions
  • WebSphere MQ for z/OS V6 and higher:

    Support for HBINT and DISCINT for SVRCONN channels was added.
    HBINT:
    On server-connection and client-connection channels, heartbeats flow only when a server MCA is waiting on an MQGET command with the WAIT option that it issued on behalf of a client application. Keepalive does not time out the channel as long the TCP/IP connection to the client still exists.

    DISCINT:
    The server-connection inactivity interval only applies between MQ API calls from a client, so no client is disconnected during a long-running MQGET with wait call. Care must be taken not to disconnect clients unless they are capable of handling the disconnect appropriately. Client disconnect might not be useful for everyone, but is available for the client applications that can benefit from it.

    For IBM MQ 8.0.0, PI27504 is needed to prevent a premature timeout with SHARECNV and DISCINT nonzero and RCVTIME=0.

    V6 and higher versions have more detail about what the client is waiting for in the SUBSTATE parameter of DISPLAY CHSTATUS. There is a DISPLAY CONN command that aids in identifying the application or IP address associated with a long-running unit of work.
     
  • At WebSphere MQ 7.0.0 and higher:
    • If the channel is defined with a nonzero value for SHARECNV and the CHSTATUS has a nonzero value for CURSHCNV, client heartbeating is available whether the channel is in an MQGET call or not. Be aware of the performance implications of sharing conversations on client-connection channels.

      PI62878 and PI62084 for V8 and PI68960 and PI69443 for V9 should be applied
       
    • RCVTIME and RCVTMIN apply to MQI channels that are sharing conversations. See PM65278. It says that for MQI channels that use sharing conversations, the heartbeat interval used by RCVTIME/RCVTMIN/RCVTTYPE is 5 seconds greater than the negotiated heartbeat interval.

      SupportPac MD0C: WebSphere MQ - Keeping Channels Up and Running recommends
        /cpf ALTER QMGR RCVTTYPE(ADD) RCVTIME(60)
      where "cpf" is the command prefix for the queue manager.

      As an example, with HBINT= 60, RCVTTYPE=ADD, and RCVTIME=60, you would have
         60 [HBINT] + 5 + 60 [RCVTIME] = 125 seconds
      for the timeout value if a heartbeat response is not received.
       
    • WebSphere MQ 7.0.1 and higher:
      An automatic client reconnection feature is offered.

PM84281 says that if you use CSQUTIL to build a client channel definition table (CCDT), CSQUTIL should be run against a queue manager of the same Version/Release/Modification.

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"ARM Category":[{"code":"a8m0z00000008QJAAY","label":"Connectivity-\u003EClient Channels"}],"ARM Case Number":"","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSFKSJ","label":"WebSphere MQ"},"ARM Category":[{"code":"a8m0z00000008QJAAY","label":"Connectivity-\u003EClient Channels"}],"Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"All Versions"}]

Product Synonym

WMQ MQ

Document Information

Modified date:
14 July 2023

UID

swg21232484