IBM Support

Database Host Server Connections Drop after a Period of Inactivity

Troubleshooting


Problem

Network hardware causes connection drops after a period of inactivity. The disconnects might be prevented with changes to hardware settings or the use of keep-alive timers.

Resolving The Problem

Problem Description

Connections made by database host server clients such as IBM Toolbox for Java JDBC or IBM i Access Client Solutions ODBC are dropped after a long period of inactivity. The error is typically seen when a user returns to an application after an hour or more. In applications that use connection pooling (such as web servers), the problem can appear more intermittent.

Clients report a communications error such as Communication link failure. Communications errors return an SQL State of 08xxx. The most common error is SQL State 08S01. Internal traces and stack dumps usually indicate the underlying socket return code. The most common socket error is connection reset by peer. Applications that do not handle communications errors can go into a loop, continuously reporting errors such as 08003 - Connection does not exist.

The database host server job associated with the connection might not receive any notification the communication error occurred. If the IBM i TCP layer receives no notification in the form a TCP FIN or reset, then the job remains active (but idle) until the dead connection is detected by the TCP keep-alive timer. The default setting allows for the job to stay active for 2 to 4 hours.

Other database server clients that might be affected include IBM i Access Client Solutions OLE DB (IBMDA400), .NET data provider, and data transfer.

Problem Detail

The cause of the problem is an idle timer in the network. This timer is typically a switch or firewall. The common default value is 1 hour. There is no idle timeout in database server or in the iSeries TCP layer. There is no idle timeout in the client (ODBC driver or JDBC driver). Any idle timeout must come from the network.

Verify that the problem is related to idle time and not a specific time of day (for example, the problem always occurs to connections active at 1am). Connection errors that occur at specific times, late night or early morning, are often caused by a user or maintenance program running the ENDJOB or ENDSBS command rather than network idle timeouts. This type of activity is logged in the QZDASOINIT job log (job ended by user xxx messages), the history log, and also in QSYSOPR message queue.

If required, IBM i communications traces, client-side traces, and sniffer traces in each segment can be used to locate the network component ending the connection. Idle timeouts are implemented by sending a TCP reset. Some components send the reset to only one end of a connection. If a reset is sent only to the client, the database server job does not end until the iSeries keep-alive timer (if enabled) detects the broken connection. The default value is 2 hours. Therefore, the database server job can stay active for 2 to 4 hours.

Problem Resolution

To avoid the network idle timeout, consider the following:
  • Adjust the idle timeout for the firewall or router.
  • Adjust any connection pools idle timeout value to a value lower than the idle timeout of the network.
  • Lower the TCP keep-alive. The keep-alive frame might reset the inactivity timer. Some network components appear to ignore keepalives.

    The system TCP keep-alive is set by using the CFGTCP command, Option 3. The value must be less than half the value of the idle timer. For example, if the firewall idle timer is 60 minutes, the keep-alive can be set to 29 minutes or less.

    IBM i Access Client Solutions Windows Application Package includes the cwbcopwr utility that can be used to adjust various socket options on the PC, including keep-alive. Options can be server or API specific. For more information about the cwbcopwr tool, see <IBM i Access Windows installation directory>\cwbcopwr.htm).
As a best practices guide, applications always monitor for communications-related errors by checking the SQL State. SQLStates of 08xxx imply that the connection is no longer active. Applications immediately disconnect, then reconnect rather than continuing to attempt to use the inactive connection.

[{"Product":{"code":"SWG60","label":"IBM i"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Host Servers","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Historical Number

323321439

Document Information

Modified date:
01 July 2021

UID

nas8N1016313