Transferring data between sockets: Streams and messages

This topic describes how to design an application protocol so that the partner program can divide the receive stream into individual messages.

Some socket applications are simple, and the receiver can continue to receive data until the sender closes the socket, for example, a simple file transfer application. Most applications are not that simple and usually require that the stream can be divided into a number of distinct messages.

A message exchanged between two socket programs must imbed information so that the receiver can decide how many bytes to expect from the sender and (optionally) what to do with the received message.

A few common techniques are used to imbed information about the length of a message into the stream, as follows:

The message type identifier technique

If your messages are fixed length, you can implement a message ID per message type worked with. Each message type has a predefined length that is known by your client and server programs. If you place the message ID at the start of each message, the receiving program can determine how long the message is if it knows the content of the first few bytes in the message. This is illustrated in Figure 1:

Figure 1. Layout of a message between a TPI client and a TPI server

*---------------------------------------------------------------*   
* Layout of a message between TPI client and TPI server         * 
*---------------------------------------------------------------*   
 01  tpi-message. 
     05  tpi-message-id             pic x.
     88  tpi-request-add        value '1'.
     88  tpi-request-update     value '2'.
     88  tpi-request-update     value '2'.
     88  tpi-request-query      value '3'.
     88  tpi-request-query      value '3'.
     88  tpi-request-delete     value '4'.
     88  tpi-query-reply        value 'A'.
     88  tpi-response           value 'B'.
     05  tpi-constant           pic x(4).
     88  tpi-identifier         value 'TPI '.

Each message ID is associated with a fixed length known to your application.

The record descriptor word (RDW) technique

If your messages are variable length, you can implement a length field in the beginning of each message. Normally, you would implement the length in a halfword binary length with the value encoded in network byte order, but you can implement it as a text field, as shown in Figure 2.

Figure 2. Transaction request message segment

*---------------------------------------------------------------* 
* Transaction Request Message segment                           * 
*---------------------------------------------------------------* 
01  TRM-message.   
    05  TRM-message-length   pic 9(4) Binary Value 20.
    05  filler                     pic x(2) Value low-value. 
    05  TRM-identifier             pic x(8) Value '*TRNREQ*'.
    05  TRM-trancode               pic x(8) Value '?????'.

The end-of-message marker technique
A third technique most often seen in C programs is to send a null-terminated string. A null-terminated string is a string of bytes terminated by a byte of binary 0. The receiving program reads whatever data is on the stream and then loops through the received buffer separating each record at the point where a null-byte is found. When the received records have been processed, the program issues a new read for the next block of data on the stream.

If your messages contain only character data, you can designate any non-display byte value as your end-of-message marker. Although this technique is most often seen in C programs, it can be used with any programming language.
The TCP/IP buffer flushing technique
This technique is based on the observed behavior of the TCP protocol, where a send() call followed by a recv() call forces the sending TCP protocol layer to flush its buffers and forward whatever data might exist on the stream to the receiving TCP protocol layer. You can use this method to implement a half-duplex, flip-flop application protocol, where your two partner programs acknowledge the receipt of each message with, for example, a 1-byte application acknowledgment message.

Figure 3 shows the TCP buffer flush technique.

Figure 3. The TCP buffer flush technique

Example that illustrates the TCP buffer flush technique, in which the server buffer is flushed after it receives the message that the client sends.

In Figure 3, the client sends an 80-byte message. The server has issued a recv() call for 1000 bytes, but receives only the 80 bytes (RETCODE=80). This presents a problem because there is no guarantee the server will receive the full 80-byte message on its receive call. It might only receive 30 bytes, but with this technique it has no way of knowing that it is missing another 50 bytes. The smaller the messages are, the less likely the server will receive only a part of the full message.

Note: This technique is widely used, but you should use it only in controlled environments, or in programs where you use non-blocking socket calls to implement your own timeout logic.

The message type identifier and the record descriptor word techniques require that the receiving program be able to learn the content of the first bytes in the message before it reads the entire message.

If this is a problem for your application, use the peek flag on a recv socket() call.

A recv() call with the peek flag on does not remove the data from the TCP buffers, but copies the number of bytes you requested into the application buffer you specified on the recv() call.

For example, if your message length field or message ID field is located within the first 5 bytes of each message, issue the following recv() call:

*---------------------------------------------------------------*
* Peek buffer and length fields for RECV call                   * 
*---------------------------------------------------------------* 
01  soket-recv                     pic x(16) value 'RECV'.
01  recv-flag-peek                 pic 9(8) binary value 2.
01  recv-peek-len                  pic 9(8) binary value 5.
01  recv-peek-buffer.  
    05  message-id                 pic x value space.
        88  tpi-query-reply        value 'A'.
        88  tpi-response           value 'B'.
    05  message-constant           pic x(4).
        88  tpi-identifier         value 'TPI'.
01  socket-descriptor              pic 9(4) binary value  0.
01  errno                          pic 9(8) binary value  0.
01  retcode                        pic s9(8) binary value 0.
*---------------------------------------------------------------*   
* Peek at first 5 bytes of client data                          *   
*---------------------------------------------------------------* 
   call 'EZASOKET' using soket-recv   
       socket-descriptor
       recv-flag-peek
       recv-peek-len
       recv-peek-buffer
       errno
       retcode.
   if retcode < 0 then 
      - process error -  
   if retcode = 0 then 
      - process client closed socket -
   if not TPI-identifier then    
      - translate recv-peek-buffer from ASCII to EBCDIC -

The recv() call blocks until some bytes have been received or the sender closes its socket. The above example is not complete since you cannot be sure that you actually received the 5 bytes requested. Your call might come back to you with only 1 byte received. In order to manage the situation, you need to repeat your recv() call until all 5 bytes have been received and recognized as such.

If the other half of the connection closes the socket, the recv() call returns 0 in the retcode field.

The data is copied into your application program buffer only, but it is still available to a recv() call, in which you can specify the full length of the message you now know to be available.