z/OS Communications Server: SNA Diagnosis Vol 1, Techniques and Procedures
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Loop

z/OS Communications Server: SNA Diagnosis Vol 1, Techniques and Procedures
GC27-3667-00

If the problem is a loop, use the procedure in Figure 1 to collect the following documentation.

Note: If you are using TSO/VTAM, use this procedure. You do not need to go to Collecting documentation for TSO/VTAM problems.
  • System console log
  • Messages associated with the loop (if any)
  • Failing module ID
  • Dump of the VTAM® address space that is looping
  • Error file output (LOGREC)
  • For a problem associated with a specific device:
Figure 1. Overview of the loop procedure (part 1 of 2)
Diagram that shows the overview of the loop procedure.
Figure 2. Overview of the loop procedure (part 2 of 2)
Diagram that shows the overview of the loop procedure.

The following procedure describes each step shown in Figure 1.

  1. Trace the loop.

    Loop problems might involve many modules or a single module. If possible, trace the looping instructions. Using the operator's reference for your host processor, instruction-step through the looping addresses. Save these addresses for use in diagnosing the problem.

    Take a dump and determine which module is looping by checking the PSW addresses in the CLKC entries for a repeating pattern.

    If the VIT was running when the loop started, look for any exception conditions that might have led to the loop. If the internal trace was not running, you might have to re-create the problem to get the trace at the time of the loop. Set the internal trace to MODE=EXT to record the trace entries in an external file.

  2. Get dump output.

    To get a dump of VTAM, issue the DUMP command, or press the Program Restart key.

    If the loop is disabled, the system console is not available for input, so take a stand-alone dump. (See Stand-alone dump.)

  3. Get the system console log and LOGREC output.

    The system console log might contain information, such as error messages, that can help you diagnose the problem. Also, print the LOGREC file.

    Use the LOGDATA option to print the in-core LOGREC buffers. See Table 1 to determine what document has information on LOGDATA.

  4. Is a message involved?

    Determine whether there are any messages associated with the loop, such as a particular message always preceding the problem, or the same message being issued repeatedly. If so, add the message numbers to your problem documentation and go to the message procedure, step 4.

  5. Is it a device error?

    For any device error, first check the NetView report (if you have the NetView program) and then the LOGREC output.

    Does the LOGREC output show repetitive entries for the same error on a particular device? If so, VTAM is receiving several different errors from that device.

    1. If the LOGREC error records are for a link or link station attached to a communication controller, get VIT PIU records and an I/O trace of the NCP. If you have the NetView program, get session trace data or session awareness data for the NCP. If the error records are for a link or device attached to a communication adapter, get VIT PIU records or a dynamic trace of the communication adapter.

      If the trace shows continual arrival of RECMS PIUs, then the repetitive entries in LOGREC are caused by a device error.

      Note: For information on counting PIUs see Counting request/response units (RUs).
    2. For channel-attached devices, use one or more of the following traces for the device to determine whether VTAM is receiving many errors:
      • VTAM internal trace with CIO option
      • Session trace data (if using the NetView program)
      • Session awareness data (if using the NetView program)
      • CCWTRACE (if available)
  6. Many errors received?

    If VTAM is receiving many errors, the problem is probably in the device. Run a CIO VIT trace to trace execution of the VTAM ERP routines. Then continue with step 7.

  7. Is the loop traced?

    If you were able to instruction-step through the loop, go to step 15; otherwise, continue with step 8.

  8. Find the failing module.
    Use the PSW to find the failing module.
    • The PSW is found in LOGREC output, the SDWA, or the RTM2WA.

      When you use PSW RESTART to terminate a looping task, a LOGREC entry is created with a completion code of X'071' for the task. An RTM2WA is also created for the task. Use the LOGREC record and the RTM work area to locate the failing module. See the diagnostic books listed in "Bibliography" for your operating system for help in locating the PSW in dump output.

      Depending on the PSW bit 32, the last 3 bytes (24-bit mode) or 4 bytes (31-bit mode) of the PSW contain the address being executed at the time of the dump. Scan the dump output to find the address given in the PSW. See Table 1 to determine which document contains more information on PSWs.
      Note: Addresses might not always be in numeric order because the dump does not always generate output in sequential order.

      If you cannot find the address, the dump might not contain the relevant portion of main storage. For example, the address might be in LPA storage. Have this portion of storage dumped, or use output from LPAMAP to identify the module, and proceed as above.

    Note: The VTAMMAP VTFNDMOD formatted dump tool can be used to gather the module information described in steps 9, 10 and 11.
  9. Find the module name that contains the failing address.

    VTAM identifies modules with an EBCDIC module name and the Julian date (and, if appropriate, the latest PTF applied) at or near the beginning of most modules. This module identifier is usually in the form:

    ISTxxxxx yy.ddd [nnnnnnn]

    where xxxxx is the last five characters of the module name, yy.ddd is the Julian date the module was assembled, and nnnnnnn is the latest PTF (if any) that has been applied to this module.

    To find the module ID, start at the failing address and scan upward (in descending address order) along the right side of the dump listing. The module ID is printed in EBCDIC. Add the module name to your documentation list.

  10. Find the module pointed to by register 12.

    General register 12 (X'0C') is normally the base register for VTAM modules. In a VTAM loop, register 12 should point to the same module found in step 11. If not, add this module name to your documentation list.

  11. Find the module pointed to by register 14.

    General register 14 (X'0E') might point to a module that called the routine that is looping. Add this module name to your documentation list.

    Add the module names from steps 9, 10, and 11 to your documentation list. You can report the problem next, but you might need to continue with step 12.

  12. Get the system trace output.

    The system trace might show many external and I/O interrupts. The PSW addresses in system trace entries will be part of the loop.

  13. Get the VIT output.

    The VIT is useful in determining the reason for a loop, such as a process being continually redispatched for the same request. Get the VIT output. If you require VIT options in addition to the default options (API, CIO, MSG, NRM, PIU, PSS, SMS, and SSCP), start a VIT in addition to the default and specify MODE=EXT. If VTAM does not accept the command, it might be necessary to re-create the problem. For more information about using the VIT, see z/OS Communications Server: SNA Diagnosis Vol 2, FFST Dumps and the VIT.

  14. Examine the trace entries.

    By examining all of the trace entries, you might be able to determine whether there is a loop. The most obvious loops would be a module or modules getting continual control of the VTAM system, or a control block chaining to itself. Check the output of the PSS option to see which VTAM routines are getting control. If you see a pattern of repetition in the trace entries, it does not necessarily mean that VTAM is looping. Some VTAM processes are timer-driven and repeat periodically.

    Note:
    1. Get the trace information and examine the clock comparative entries for repeating PSW addresses. For short loops, the repeating PSWs show the extent of the loop.
    2. The absence of any apparent loop does not necessarily mean that VTAM is not looping. The loop might not contain a VTAM trace point.

    If a module or modules are looping, get their addresses from the trace entries. Step 15 explains how to find the module name.

    If you find a control block chained to itself, or if a queue of control blocks is in a cycle, try to identify the control block. Most control blocks have a 1-byte ID at offset X'00'. See the control block ID codes in Storage and control block ID codes to identify the control block name.

  15. Find the module names.
    Note: You can also use the VTAMMAP VTFNDMOD formatted dump tool to find the module ID. See VTFNDMOD.

    Use the addresses found in step 14 to find the module names involved in the loop.

    To find the module ID, start at the failing address and scan upward (in descending address order) along the right side of the dump listing. The module ID is printed in EBCDIC. Add this module ID to your documentation list. Continue with step 16.

  16. Report or go to the failing module procedure.

    If you determined the module names, go to Failing module. Otherwise, you are ready to contact IBM®. Go to Reporting the problem to IBM.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014