Check client connection and connection/session takeover

A Telnet client can detect the loss of connectivity to Telnet without Telnet being aware of the loss. This condition is typically caused by a temporary problem in the network, such as a router experiencing a failure. In many cases, the network can correct itself and client retransmission can recover the data flow. However, the user often terminates the Telnet TCP connection at the emulator without waiting for the network to recover. In other cases, such as when a single router supports the client, the Telnet TCP connection is eventually terminated by the client TCP/IP stack after a certain number of retransmissions fail. When network connectivity is restored, the user initiates a new connection to Telnet. Even though the user is reconnected, in many cases the original SNA session cannot be associated with the new TCP connection because the session continues to be associated with the original TCP connection.

Assume that the host application is TSO and the user is in session with TSO user ID USER1. The route is lost and the user disconnects. The user then establishes a new Telnet connection without specifying an LU name. Telnet assigns a different LU to represent the client. When the logon to user ID USER1 is attempted, TSO fails the logon attempt because USER1 is still in session with the original Telnet LU. If the user does specify the same LU name when reconnecting, the connection is rejected sooner. Telnet fails the request during Early Lookup, indicating that the LU is already in use. The problem with both situations is that Telnet assumes that the original connection is still active and that the SNA session is still associated with that original connection.

The user can not terminate the original TCP connection and SNA session. The user has two choices:

Wait for an inactivity timer in Telnet to clean up the original SNA session and close the original TCP connection
Wait for an inactivity timer in TSO to initiate session termination

In the second case, either Telnet is configured to close the connection at session termination (NOLUSSIONPEND), or when Telnet attempts to refresh the TCP connection, Telnet detects that the TCP connection is gone and closes its representation of the connection. The end result in all cases is that the original SNA session and the original TCP connection are cleaned up after the specified inactive period of time has elapsed.

The user would like to quickly close the original connection to free the LU and the SNA session. If you specify the CheckClientConn parameter, Telnet checks the connectivity of all pre-existing connections associated with the client identifier of the new connection being established. The check is performed by sending a Timemark to each existing connection. The new connection is delayed early in the connection negotiation until all existing connections have responded or the specified wait time has elapsed. When all connections have responded, setup of the new connection continues. If some connections do not respond, then after the specified time elapses, Telnet closes any connections that did not receive a response. As soon as all of the unresponsive connections are closed, setup of the new connection continues. The new connection is held until cleanup is performed to ensure that an LU is available and that the previous SNA session is cleaned up before the new connection continues negotiation. The CheckClientConn parameter is useful when multiple emulators exist on a single client and the user can tolerate the session being disconnected.

Rule: If you are using sysplex distributor to distribute connections across Telnet servers with CheckClientConn, you must use timed client affinity to ensure that the clients reconnect to the same Telnet server.

Tip: Be careful using the CheckClientConn parameter where proxy servers are being used and the Client Identifier is IP address rather than Host name or User ID. Telnet perceives all connections coming from the same IP address as coming from the same client. Depending on the number of connections, Telnet could send a large number of Timemarks every time a new emulator connects. A second parameter is available to limit the number of connections checked for a single client identifier.

Some users require that the same LU be assigned when the new connection takes over the old connection. The takeover function fixes this problem. When takeover is active for a connection, Telnet LU lookup attempts LU takeover upon entry to the lookup function, whenever a single LU name can be associated with a connection. Any number of existing Specific requests from one client identifier can be associated with new Specific requests. A single existing Generic request from one client identifier can be associated with a new Generic request. If there are multiple existing Generic requests, only the first is taken over and the remaining connections continue to hang.

The TKOSPECLU and TKOSPECLURECON statements activate takeover for the connection and require that the user specify the LU name. The statement name is derived from the function of connection takeover (TKO) for a Specific LU (SPECLU) connection request. Because the LU name is specified, LU lookup attempts to take over the original connection to which the LU was assigned. The Specific LU request allows a user to move to any client to take over a connection that is lost and provides some level of security because it requires the user to know the LU name.

The TKOGENLU and TKOGENLURECON statements activate takeover for a connection without the user specifying an LU name. The statement name is derived from the function of connection takeover (TKO) for a Generic LU (GENLU) connection request. During the original connection LU lookup, Telnet saves the LU name by Client Identifier. If another connection request is received from the same Client Identifier, Telnet assumes takeover should be attempted using the original LU name. If multiple connections are made from a single client, all additional connection setups will be delayed by the time it takes to attempt takeover of the first connection. When the takeover attempt fails, Telnet resumes normal Generic LU lookup for that connection. For Generic LU takeover to work, the Client Identifier must remain the same. The Client Identifier can be the client user ID derived from the client security certificate. If no user ID is available, the client hostname is used. If no hostname is available, the client IP address is used.

Rule: If you are using sysplex distributor to distribute connections across Telnet servers with TKOGENLU or TKOGENLURECON, you must use timed client affinity to ensure that the clients reconnect to the same Telnet server.

The TKOSPECLU and TKOGENLU statements cause the following events to occur. When the takeover connection request arrives, Telnet LU lookup discovers the LU name is in use and suspends the new request. Telnet sends a TIMEMARK request to the original client, which acts as an "are you there" message. The client is required to respond to the TIMEMARK. If no response is received by Telnet within the time specified on the takeover statement, Telnet drops the original session and connection. The original LU is reserved during the drop process. After the original session and connection are dropped, Telnet resumes processing the new request. This time the LU is not in use, only reserved for takeover purposes, and is assigned to the new takeover session. The user is essentially starting over. The original session has been dropped, allowing the user to immediately log on to the same TSO user ID again.

When sysplex distributor is used to distribute connections across Telnet servers with the TKOSPECLU or TKOSPECLURECON statements, the series of events can be extended to include the LUNS and possibly the previous owning LUNR. When the takeover connection request arrives at a LUNR and Telnet LU lookup discovers that the LU name is shared and is not already allocated to itself, the new request is suspended, and verification and allocation is requested from the LUNS. If the LUNS determines that the LU name has been allocated to another LUNR, the LUNS sends a verification request to the other LUNR. That LUNR sends a TIMEMARK to its original client. If the client responds, the owning LUNR informs the LUNS that the LU name is still in use. If the client does not respond, the owning LUNR drops the original connection and tells the LUNS to deallocate the LU name. If the LUNS determines that the LU name is not in use, the name is allocated to the requesting LUNR. Depending on the response from the LUNS, the new LUNR then either accepts or rejects the takeover request.

The TKOSPECLURECON or TKOGENLURECON statements can be used to accomplish the same connection drop but avoid the session drop. When the original connection is dropped, the Telnet LU stays in session with the host application. The new connection is established and Telnet sends an LUSTAT to the host application indicating that Presentation Space Integrity was lost (X'082B'). Depending on the application, it will either end the session or resend the previous screen. By resending the previous screen, the user is able save the original session and avoid the SNA session tear-down and restart process. At worst, if the application drops the session upon receipt of the LUSTAT, the user is able to immediately log on again as if TKOSPECLU or TKOGENLU were coded.

In some cases, the original client TCP/IP stack may respond to the Timemark with a RESET. Telnet interprets this RESET as a client disconnect and assumes the user disconnected the session. Telnet then drops the session. To keep the session in this case, add the KeepOnTmReset option to the TKOSPECLURECON or TKOGENLURECON statement. A security risk exists when using this parameter. A user may actually disconnect just before the Timemark arrives due to an unauthorized takeover attempt. Telnet will interpret the disconnect as a response to the Timemark and allow the takeover without loss of the VTAM® session.

Part of session initiation includes Telnet issuing a SETLOGON VTAM macro that contains control vectors, including control vector 64 described in Connection information passed on the CINIT control vector 64.

The control vectors include client information, such as IP address and host name if available. If the connection is capable of being taken over, a flag is set in the control vector. This information is passed to the application's logon exit for use by the application. If the connection is taken over, a second SETLOGON macro is issued. However, the control vector information does not go beyond the VTAM that processed the SETLOGON. The application's logon exit does not run again and is not aware of connection changes.

Some applications require the IP address to remain constant. Because the application cannot be notified of changes, Telnet has the ability to allow takeover by only clients from the same IP address. Specify SAMEIPADDR on the TKOGENLURECON or TKOSPECLURECON statements to ensure takeover is done by a client from the same IP address.

TKOSPECLU, TKOSPECLURECON, TKOGENLU, and TKOGENLURECON can be coded at all three parameter block levels for different levels of granularity. Code NOTKO to turn off all takeover function at any of the three levels.

The new connection must have an equal or higher security level to take over the original connection. The order for connection security is:

Basic
Secure
Secure with CLIENTAUTH SSLCERT
Secure with CLIENTAUTH SAFCERT

If the original connection used SSL with CLIENTAUTH SAFCERT, either takeover method will verify that the new connection is using a client certificate that maps to the same user ID. Telnet verifies this by translating the new certificate to a user ID and comparing the new user ID to the user ID on the original connection.

Tip: If you are using TKOSPECLURECON or TKOGENLURECON, you can specify SAMECONNTYPE to force the same connection type between the taker and the target connections. CV64 information is not updated when a session takeover occurs, and if a secure connection takes over a basic connection, the application is not updated with the new connection level because no new CV64 is sent to the application. You can also consider using takeover without reconnect (TKOSPECLU or TKOGENLU).

Guideline: If you map multiple certificates to the same user ID, a client presenting any of those certificates will be able to take over the connection. If there is a chance the connection can be taken over by an unauthorized user, TKOSPECLURECON and TKOGENLURECON should not be used. Neither statement requires the user to reverify user authenticity to the host application.

Tip: A time value of zero is permitted. In this case the server will always perform the takeover whether or not the original connection is still active. The zero value is intended for testing purposes rather than production use.

Sometimes a takeover attempt will not complete as expected. This might be due to one of the following factors:

The profile of the original connection defines how the original connection can be taken over. Be sure that the original connection supports the wanted takeover method.
The new connection is not of the same connection type as the original session, and SAMECONNTYPE was specified on the TKOSPECLURECON or TKOGENLURECON statement.
The new connection request must specify the LU name of the connection being taken over if TKOSPECLU or TKOSPECLURECON is specified.
TKOSPECLURECON and TKOGENLURECON do not preserve a session if the takeover is done from a different port. The ACB of the LU is associated with the original port and must be closed before it can be associated with the new port. The takeover will function as a TKOSPECLU and TKOGENLU takeover.
TKOSPECLURECON and TKOGENLURECON do not preserve a session if the takeover is done for a shared Telnet LU name managed by a LUNS.
TKOSPECLURECON and TKOGENLURECON might not preserve a session if the original client connection is ended before the TKOSPECLURECON or TKOGENLURECON timer expires. If the close reason is TIMEMARK or INACTIVE, the session will be preserved under the assumption that the inactivity is due to a lost connection. Any other close reason will cause the takeover to function as a TKOSPECLU or TKOGENLU takeover. This is done to protect users who disconnect their client as a means of logging off their session. These sessions will not be taken over. Instead, the user will have to issue a new logon.

Takeover is also affected by where the new client is and how the old client responds. There are several event scenarios and results will vary.

Event 1
New client connects from a different IP address or Port.

Original client responds to TIMEMARK.

Result - Takeover will not occur in this case because the original client is still responding. With a Specific LU request, the new client will receive an error indicating the LU is already in use. With a Generic LU request, Telnet assigns the next available LU to the connection.
Event 2
New client connects from a different IP address or Port.

Original client does not respond to TIMEMARK.

Result - Connection takeover will occur. If TKOSPECLURECON or TKOGENLURECON is mapped to the client, session takeover will occur. A likely scenario in this case is Telnet has lost connectivity to the old connection due to a failed router and the new connection is using a different route, the original machine lost power and has not reestablished connectivity, or the original machine lost power but reestablished connectivity with a different IP address.
Event 3
New client connects from a different IP address or Port.

Original client stack responds with RESET.

Result - Connection takeover will occur. However, even if TKOSPECLURECON or TKOGENLURECON is coded, session takeover will not occur because Telnet handles the RESET as a client disconnect. A likely scenario in this case is a PC lost power and then regained power. The takeover request is accepted from either a different PC or the same PC using a different port. After the new connection request is accepted, Telnet sends a TIMEMARK to the original client PC stack. The PC stack does not recognize the IP-port and responds with a RESET. If KeepOnTmReset is specified and the RESET is received by Telnet after the Timemark has been sent, Telnet will keep the session.
Event 4
New client connects from the same IP address and Port.

Telnet stack rejects the request.

Original client times out and sends a RESET.

Result - The original session and connection are dropped. Takeover does not occur. The new client is able to connect on retry because the original connection and session were cleaned up. A likely scenario in this case is a PC has lost power and then regained power. The same PC is used to attempt takeover. The client is assigned the same port as before the power loss and has the same IP address.