IBM Support

FTP timeouts during data transfer for the z/OS FTP server

Product Documentation


Abstract

This describes how to diagnose FTP timeouts when transferring data to and from the z/OS FTP server.

Content

FTP timeouts can occur during data transfer for the z/OS FTP server. In most cases, this problem will be reported through messages in the FTP client. Although we cannot anticipate what those messages might be, they will most likely indicate a connection has been reset, a connection was interrupted or timed out, or that the server is not responding.


Before you begin:
1. Be sure the following z/OS FTP server timer default values are set to these defaults or higher in the FTP.DATA configuration file. If you need assistance identifying which FTP.DATA the FTP server is using, the hierarchical search order can be found in the topic called TCP/IP configuration data sets in the IP Configuration Reference manual.

      DATACTTIME        300
      DCONNTIME         120
      INACTTIME         300
2. Make sure your FTP KEEPALIVE option has a value of 60 seconds or greater.

3. If you have a checkpoint firewall, make sure you have turned off the FTP enforce new line setting.
    3.1 Edit the $FWDIR/lib/base.def
    3.2 Change (comment out) the following line:
      #define FTP_ENFORCE_NL
      to:
      //#define FTP_ENFORCE_NL
    3.3 Re-install the security policy

Steps for resolving FTP timeouts during data transfer for the server :
1. Before retrying the failing data transfer, start the packet trace facility. Detailed instructions for doing this can be found in the Technote How to Collect PKTTRACE and CTRACE on z/OS. If this is a high-volume system, be sure to reference the section on "Packet Trace Parameters" on how to filter tracing based on IP addresses and ports.

2. Obtain an FTP server trace with options BAS, TIM, and FLO. You can do this by issuing the following MVS MODIFY console command before starting the failing data transfer (note that FTP_servername is simply the jobname under which the FTP server is running, such as FTPD1 or FTPSERV):
      F FTP_servername,DEBUG=(FLO,BAS)

3. Check for the output from the trace in SYSLOGD (see the Technote MustGather: Collect Troubleshooting Data, Read First: Syslogd for z/OS Communications Server for a SYSLOGD How-To Guide).

4. Re-create the problem and deactivate the traces. To deactivate the trace, issue MVS console command:
      F FTP_servername,DEBUG=(none)
    5. You now need to perform trace analysis. Begin this by browsing the client trace and locate the time that the timeout occurred. Obtain the timestamp from the message. Determine the port on which the data connection is being opened by finding the PORT command, if this is an active data connection, or by using the PASV response, if this is a passive connection. The text for both of these messages will be a string in the format of u,v,w,x,y,z. This can be interpreted as follows:
      5.1 The IP address to which the data connection will be opened is u.v.w.x.
      5.2 The port to which it will be opened can be calculated using the formula: (y x 256) + z.
      5.3 At this point, you have identified the timestamp, the IP address, and the port to which the data connection will be opened.
    6. Format the packet trace in IPCS by issuing the command
        CTRACE COMP(SYSTCPDA) LOCAL FULL OPTIONS((SESSION))
    This command formats the trace into a summary for each IP connection.

    7. Find the summary that represents the failing connection by looking for an FTP connection which encompasses the time stamp from the client. Use the combination of the port, the IP address, and the timestamp to ensure you have the correct summary. Here is a sample of what you will see:

    Local Ip Address:                             x.x.x.x
    Remote Ip Address:                                   y.y.y.y
    
    Host:                                 Local,          Remote
    Client or Server:                   Unknown,         Unknown
     Port:                                   21,            1095
     Application:                        ftp,,
     Link speed (parm):                      10,              10 Megabits/s
    
    Connection:
     First timestamp:                 2007/08/24 10:21:54.688167
     Last timestamp:                  2007/08/24 11:46:16.591480


    8. Scroll down to the bottom of the summary portion and review the count of different types of packets received during this connection. Pay particular attention to duplicate ACKs, Zero Window sizes, Retransmissions, Out of order Packets, and Fragments.

    Packets:                               97,             109
      (x) Untraced Packets:                 3,               0
      (.) In-order data:                   52,              33 (53.60%), (30.27%)
      (a) Acknowledgments:                 20,              40 (20.61%), (36.69%)
      (+) Data and ACK:                    23,              30 (23.71%), (27.52%)
      (u) Duplicate ACKs:                   0,               0 ( 0.00%), ( 0.00%)
      (w) Window size updates:              1,               0 ( 1.03%), ( 0.00%)
      (z) Zero window sizes:                0,               0 ( 0.00%), ( 0.00%)
      (p) Window probes:                    0,               0 ( 0.00%), ( 0.00%)
      (k) Keepalive segments:               0,               0 ( 0.00%), ( 0.00%)
      (r) Retransmissions:                  0,               0 ( 0.00%), ( 0.00%)
      (o) Out-of-order:                     1,               2 ( 1.03%), ( 1.83%)
      (d) Delayed ACKs:                     0,               4 ( 0.00%), ( 3.66%)
      (f) Fragments:                        0,               0 ( 0.00%), ( 0.00%)
      (!) Dropped:                          0,               0 ( 0.00%), ( 0.00%)


    9. If any of those have significantly higher percentages for inbound data, then the problem resides on the network, and further diagnostics will be required outside of the mainframe. Even if you see evidence of problems outside of the mainframe, continue with step 6 to investigate possible scenarios.

    10. Scroll down past the summary to the individual descriptions and examine the approximate timestamp obtained from the trace.

    11. Some of the things you can look for include:
    • A large value in the delta time column.

     TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp 
    1      S   I   2499151066          0  64512     0   0.000000 14:23:50.221695 
    2   A  S   O   1632413670 2499151067  65535     0   0.000027 14:23:50.221722 
    3   A  S   O   1632413670 2499151067  65535     0   2.943323 14:23:53.165045 
    4      S   I   2499151066          0  64512     0   0.002726 14:23:53.167771 
    5      S   I   2499151066          0  64512     0   5.919115 14:23:59.086886 

    This shows a timeout attempting to establish a connection. However, similar sequences may also occur during mid-transfer. In packet 1, we see the remote host attempting to open a connection by sending the local host a SYN packet. The local side correctly sends the ACK SYN response in packet 2. The next packet should be an ACK sent from the remote host. Because this never arrives, the local host retransmits the ACK SYN in packet 3 (2.94 seconds later). Packets four and five are retransmissions of the initial SYN the remote host sent in packet 1. This indicates that the local host's ACK SYNs never arrived at the remote host, and that there may be routing issues between the two hosts, or a firewall/router in between the two hosts that is dropping the responses.

    • A capital R indicating reset in the TcpHdr column.

    Example 1
     TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
    1      S   I    965574368          0 368640     0   0.000000 10:44:43.727489
    2   A  S   O   2501664216  965574369 368640     0   0.000125 10:44:43.727614
    3   A      I u  965574369 2501664217 368640     0   0.000819 10:44:43.728433
    4   A      O . 2501664217  965574369 368640  8940   0.268255 10:44:43.996688
    5   AP     O . 2501673157  965574369 368640  3695   0.000140 10:44:43.996828
    6   AP  F  O ? 2501676852  965574369 368640     0   0.002937 10:44:43.999765
    7   AP     O r 2501664217  965574369 368640  8940   2.191677 10:44:46.191442
    8   AP     O r 2501664217  965574369 368640  8940   4.438225 10:44:50.629667
    9   AP     O r 2501664217  965574369 368640  8940   8.958191 10:44:59.587858
    10  AP     O r 2501664217  965574369 368640  8940  17.822049 10:45:17.409907
    11  AP     O r 2501664217  965574369 368640  8940  36.668251 10:45:54.078158
    12     R   O   2501664217  965574369      0     0  72.829141 10:45:54.078158

    Example 1 shows a remote host failing to acknowledge data sent by the local host. The connection is typical up until packet 7. Packet 7 should show an ACK for the 12635 bytes of data, plus the FIN sent on packet 6. Instead, nothing arrives from the remote host, and the data is retransmitted on packets 7 through 11. Eventually, the local host assumes an error has occurred and resets the connection.

    Example 2
     TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
    1   A      I . 1428014775   99166319  64856   524   0.015828 10:07:53.533832
    2   A      I . 1428015299   99166319  64856    24   4.941343 10:07:58.475175
    3   A      O d   99166319 1428015323 262144     0   0.261318 10:07:58.736493
    4   A      I . 1428015323   99166319  64856   524   0.055756 10:07:58.792249
    5   A      I . 1428015847   99166319  64856   524   0.000005 10:07:58.792254
    6   A      I . 1428016371   99166319  64856   292   4.960406 10:08:03.752660
    7   A      O d   99166319 1428016663 262144     0   0.261318 10:07:58.736493
    8     R    O d   99166319 1428016663 262144     0  73.878330 10:09:17.630990

    In Example 2, the connection proceeds normally until packet 8. Note that in packet six, the amount of data sent by the remote host (292 bytes) is lower than the 524 bytes sent on most of the other packets. This usually indicates that there is no more data left to be sent. On packet 7, the local host ACKs the last data sent. However, it can do nothing more until either the remote host sends more data or it sends a FIN packet to indicate that all of the data has been transferred. Because the remote host does neither, it resets the connection in packet 8.

    • A lower-case Z in the F (flag) column indicating that the window size has gone to zero.

     TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
    1   A      I   3809785819 2800163550  27768     0   0.000021 10:36:57.019183
    2   A      O   2800164998 3809785819  65535  1448   0.000048 10:36:57.019231
    3   A      O   2800173686 3809785819  65535  1448   0.000001 10:36:57.019232
    4   A      O   2800175134 3809785819  65535  1448   0.000007 10:36:57.019239
    5   A      I a 3809785819 2800191062    256     0   0.000035 10:36:57.019274
    6   A      O   2800163550 3809785819  65535  1448   0.000551 10:36:57.019825
    7   A      O   2800164998 3809785819  65535    12   0.000005 10:36:57.019830
    8   A      I   3809785819 2800191062    256     0   0.000029 10:36:57.019859
    9   A      O o 2800191062 3809785819  65535  1448   0.000032 10:36:57.019891
    10  A      O . 2800192510 3809785819  65535  1448   0.000203 10:36:57.020094
    11  A      I a 3809785819 2800191318      0     0   0.000029 10:36:57.020123
    12  A      O . 2800193958 3809785819  65535  1448   0.000032 10:36:57.020155
    13  A      I z 3809785819 2800191318      0     0   0.000016 10:36:57.020171
    14  A      O . 2800195406 3809785819  65535  1448   0.000128 10:36:57.020299
    15  A      I z 3809785819 2800191318      0     0   0.000021 10:36:57.020320
    16  A      O r 2800191318 3809785819  65535     1   0.260479 10:36:57.280799
    .......lines 17 - 29 removed for brevity
    30    R    I z 3809785819 2800191318      0     0  25.544452 10:38:00.887886

    This example shows a situation in which the remote side has set its window size to zero. The window size tells the local host how much unacknowledged data it can send at once. In packets five and eight, we see that the remote host has a window size of 256 bytes. However, in packet eleven, this drops to zero. Starting in packet 16, the local host begins sending 1 byte probes to see if the window size has increased. This will continue indefinitely until the remote host increases its window size or the connection is otherwise terminated. In this example, the remote side ultimately RESETs the connection in packet 30.

    • A lower-case O in the F (flag) column indicating that the remote host sent out of order data.

     TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
    1   AP     O a  663518973 1680312427  65535     0   0.000094 21:53:58.470801
    2   A      I . 1680312427  663518973  64094  1460   0.000020 21:53:58.470821
    3   A      I . 1680313887  663518973  64094  1460   0.000093 21:53:58.470914
    4   AP     O a  663518973 1680315347  65535     0   0.000065 21:53:58.470979
    5   A      I . 1680315347  663518973  64094  1460   0.000254 21:53:58.471233
    6   A      I . 1680316807  663518973  64094  1460   0.000006 21:53:58.471239
    7   AP     O a  663518973 1680318267  65535     0   0.000145 21:53:58.471384
    8   A      I . 1680318267  663518973  64094  1460   0.000018 21:53:58.471402
    9   A      I o 1690049219  663518973  64094  1460   1.064534 21:53:59.535936
    10  A      I . 1690050679  663518973  64094  1460   0.000047 21:53:59.535983

    In this example, packets are processed correctly on lines 1 through 8. Note that the remote host is sending 1460 byte packets to the local host, who is ACKing them correctly. However, on line nine the remote host sends an out-of-order packet. This can be identified by two methods. First, the F column shows an 'o' flag, meaning out-of-order. This can also be verified by reviewing the sequence numbers (Seq column) of the inbound packets. They progress as follows:

    1680318267
    1690049219
    1690050679

    Because the sequence numbers increment by the number of bytes sent on each, they should have progressed as:

    1680318267
    1680319727
    1680321187
    <...roughly 6661 more missing packets>
    1690046299
    1690047759
    1690049219
    1690050679

    • The remote host stops responding to keep-alive packets.

      TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       
    1   AP     O + 3101947168 1513769269  65535    54   0.000318  
    2   A      I a 1513769269 3101947222  65535     0   0.166097 
    3   AP     O k 3101947221 1513769269  65535     0 185.834001 
    4   A      I u 1513769269 3101947222  65535     0   0.001453 
    5   AP     O u 3101947221 1513769269  65535     0 190.768437 
    6   A      I u 1513769269 3101947222  65535     0   0.001386 
    7   AP     O u 3101947221 1513769269  65535     0 190.735614 
    8   A      I u 1513769269 3101947222  65535     0   0.001089 
    9   AP     O u 3101947221 1513769269  65535     0 190.714878 
    10  A      I u 1513769269 3101947222  65535     0   0.002175 
    11  AP     O u 3101947221 1513769269  65535     0 190.763630 
    12  A      I u 1513769269 3101947222  65535     0   0.001706 
    13  AP     O u 3101947221 1513769269  65535     0 190.757661 
    14  AP     O u 3101947221 1513769269  65535     0  76.292870 
    15  AP     O u 3101947221 1513769269  65535     0  76.299600 
    16  AP     O u 3101947221 1513769269  65535     0  76.299781 
    17  AP     O u 3101947221 1513769269  65535     0  76.307884 
    18  AP     O u 3101947221 1513769269  65535     0  76.296677 
    19  AP     O u 3101947221 1513769269  65535     0  75.286330 
    20  AP     O u 3101947221 1513769269  65535     0  75.281099 
    21  AP     O u 3101947221 1513769269  65535     0  76.294458

    This shows the events occurring on a control connection during a long data transfer. The data transfer itself completes normally. In order to keep the control connection from being timed out, the local TCP/IP stack sends keepalive packets. These are seen on lines 3, 5, 7, 9, 11, and 13. The remote stack responds correctly to these on lines 4, 6, 8, 10, and 12. However, it never responds to the keepalive sent on line 13. Instead, this packet is retransmitted with no response on lines 14 through 21. As a result, the local stack assumes that the remote host has dropped the connection, and closes its own connection. Note that no RESET is sent. In this situation, a RESET would be sent only if the remote host attempted to send another packet to the local stack.

    12. Once you identify a problematic area in the trace, pay particular attention to both the IO and the Delta Time columns. IO indicates if the packet was inbound or outbound, and Delta Time indicates how much time has passed since the last packet was sent or received. The combination of these two fields can help determine whether or not the problem resides on the local server or on the remote client. For example, an inbound RESET might suggest a problem on the remote client. Conversely, an inbound RESET with a large delta time suggests that the remote client timed out waiting for a response from the server, and further investigation should be performed locally.
      12.1 If you determine that the problem is occurring outside of the mainframe, additional investigation must be done on the remote client itself, or possibly on the network segments between the client and server.
      12.2 If you determine that the problem is occurring on the mainframe, additional diagnostics must be done on the local TCPIP stack and the FTP server. Collect a new set of documentation including:
        A new client trace
        A new SYSTCPDA
        A dump of TCPIP and, if possible, the FTP client, with SDATA options
          SDATA=(ALLNUC,CSA,LSQA,SQA,PSA,RGN,SWA,TRT,LPA)
        SYSTCPIP trace with options
          OPTIONS=(TCP,INTERNET,PFS,SOCKET)

    Again, you can find instructions for running SYSTCPIP and SYSTCPDA traces in the Technote How to Collect PKTTRACE and CTRACE on z/OS. With this set of documentation, contact the IBM support center.

    Original Publication Date

    22 January 2008

    [{"Product":{"code":"SSSN3L","label":"z\/OS Communications Server"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":"All","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"1.8;1.9;1.10;1.11;1.12;1.13;2.1;2.2","Edition":"All Editions","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

    Document Information

    Modified date:
    17 June 2018

    UID

    swg27011272