z/OS UNIX System Services File System Interface Reference
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Asynchronous I/O flow details

z/OS UNIX System Services File System Interface Reference
SA23-2285-00

This flow is discussed as an addition to an existing PFS design that already handles synchronous blocking and non-blocking socket operations.
  1. BPX1AIO/BPX4AIO (asyncio) is called with an Aiocb structure. The Aiocb contains all the information that is needed to do the specific function.
  2. The LFS builds an Async I/O Request Block (RqBlk). The PFS has signified support via the Pfsi_Asyio PFSinit output bit. The regular vnode operation for the function is invoked in the PFS with:
    • + The osi_asy1 bit turned on to indicate Async I/O Part 1.
    • + The osi_asytok field holding the LFS_AsyTok token.
  3. Part 1 in the PFS:
    • The PFS builds its own Request Block. The LFS_AsyTok is saved for later use with osi_sched(). The PFS's PFS_AsyTok is passed back to the LFS via osi_upda(). This identifies the request to the PFS in Part 2 and to vn_cancel. Basic preliminary parameter and state checking can be done here.
    • The user's read buffers are not referenced during Part 1 unless osi_ok2compimd=ON (see the Variations in this topic). This allows the user to defer read buffer allocation to just before Part 2. The requested length for reads is available, even if the buffers are not.
    • The PFS queues the request to await the desired event. This is essentially the same thing that is normally done for blocking requests. Instead of calling osi_wait(), as it would at this point for a blocking request, the PFS returns to the LFS with the Return_value, Return_code, and Reason_code (RRR) from queueing the asynchronous I/O. For a successfully queued request, the Return_value is 0, and any output from the operation is deferred until Part 2. Important PFS structures are preserved as necessary over this return and the subsequent reentry to the PFS for Part 2.
    The variations are as follows:
    • If the operation fails during Part 1, the normal path is taken and, instead of the request being queued, the failure is returned. This includes both queueing failures and failures of the function that is being requested.
    • If the operation can be completed immediately and osi_ok2compimd=ON, the PFS can proceed as it would normally and complete the operation synchronously. osi_compimd is turned ON to tell the LFS that this has happened.
    • If osi_ok2compimd=OFF, the PFS must make the call to osi_sched from within this vnode operation, and proceed from Part 2 as if the data were not immediately available. This bit is only OFF for read/write type operations. If the PFS does not need to be recalled for Part 2 (for instance, with a short write), it can skip the call to osi_upda. It is all right to transfer the responsibility for calling osi_sched to some other thread, making the call asynchronously and returning to the LFS, as long as you do not wait for network input.
  4. The LFS returns to the caller with AioRC=EINPROGRESS; or, if it has failed or completed immediately, cleans up and returns the operation's results.
  5. The original caller continues. All structures and data buffers must persist throughout the operation.
  6. Event occurrence in the PFS:
    • At some point data arrives for the socket, or buffers become available, and the request can be completed.
    • The PFS notices, or responds to, this condition as it normally does. Instead of calling osi_post(), as it would at this point for a blocked request, it calls osi_sched() with the saved LFS_AsyTok to drive Part 2.
    • For read type operations, the passed Return_Value contains the length of the data that is available to be read in Part 2. This is an optional performance enhancement that some applications may take advantage of. If the length is not easily known, 0 should be passed.
    • The rest of the action happens on the SRB, because user data cannot generally be moved while it is on the thread that calls osi_post/osi_sched.
    The variations are as follows:
    • If the request fails asynchronously, the PFS can report this on the call to osi_sched() by passing the failing three R's. There will be no Part 2 if the passed Return_value is -1, so the PFS has to clean everything up from here.
    • Alternatively, the PFS can save the results, pass success to osi_sched(), and report the failure from Part 2. This is sometimes more convenient when the event handler is in a separate address space and the PFS has resources to clean up in the kernel address space. The only time osi_sched() fails is if the passed LFS_AsyTok is no longer valid, which may represent a logic error in the PFS. osi_sched() succeeds even after the user has terminated, but the PFS sees vn_cancel instead of Part 2.
  7. The LFS schedules an SRB into the user's address space and returns to the PFS. The SRB runs asynchronously to the caller of osi_sched().
  8. The SRB runs in the user's address space, so that the user's data buffers can be referenced from "home" while in cross-memory mode. This also gets the user's address space swapped in if necessary. The LFS is recalled to get into the kernel address space.
  9. The LFS reconstructs the original vnode request structures. The same vnode operation is invoked in the PFS as for Part 1, with:
    • + The osi_asy2 bit turned on to indicate Async I/O Part 2.
    • + The osi_asytok field holding the PFS_AsyTok value from osi_upda()

    The variations are as follows:

    If osi_upda was not called during Part 1, the PFS is not called for Part 2.

  10. Part 2 in the PFS:
    • This is running on an SRB instead of the more usual TCB, and the PFS has to be able to handle this mode.
    • From the PFS_AsyTok, the PFS is able to pick up from where it left off at the end of Part 1 (3), when it returned to the LFS instead of waiting. Necessary information that is related to the completing operation is obtained in a manner similar to that in which it is obtained after coming back from osi_wait().
    • Data is moved between the user's and the PFS's buffers for read/write types of operations; or the operation is completed as appropriate.
    • The normal cross-memory environment has been recreated, with the user's buffers in home and the PFS's buffers in primary; or it is otherwise addressable as arranged by the PFS.
    • The normal move-with-key instructions are used to protect against unauthorized access to storage. The osi copy services are available.
    • For unauthorized callers in a TSO address space, the LFS has stopped the user from running authorized TSO commands while async I/O is outstanding. This avoids an obscure integrity problem, with user key storage being modified from a system SRB.
    • The PFS returns to the LFS with the results of the operation and the normal output for this particular vnode operation, such as the vnode_token from vn_accept. The operation is over at this point, as far as the PFS is concerned.
    The variations are as follows:
    • If the operation fails during Part 2, this is reported back. An earlier failure may have been deferred to Part 2 by the PFS.
    • For very large writes, the PFS may not want to commit all of its buffers to one caller. It may instead loop, sending smaller segments and waiting in between for more buffers. If this is the case, the PFS remains in control and does not return from Part 2 until the whole operation is complete, that is, until the remainder of the operation is synchronous and the PFS blocks as necessary, as it normally does in this loop. osi_wait is convenient here, as it accommodates SRB callers. Essentially, osi_sched() is only called when the first set of buffers become available and the effect is to offload the work from the user's task or SRB to a system SRB. The operation is still asynchronous to the user. This ties up the SRB, but it is considered to be a situation of relatively small frequency.
    • Because SRBs are not interrupted with signals, osi_waits during Part 2 normally do not return as they do in the EINTR cases. If the user's process terminates, signal-enabled osi_waits return as if they have been signaled.
  11. On return to the LFS, signals are sent and unauthorized exits are queued to the user's TCB (not shown).
  12. The LFS returns to the SRB.
  13. On return to the SRB, authorized exits are called and ECBs are posted. When the user program is notified that the I/O has completed, either on the SRB or user's TCB, it can free the Aiocb and buffers. The operation is over, as far as the LFS is concerned, either at the end of the SRB or after an unauthorized exit has run on the user's TCB.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014