HANG/WAIT in RLS/TVS

Symptoms

If transactions are not processing, quiesces not quiescing, or data sets failing to open or close, you may have a form of hang within the SMSVSAM address space.

How to investigate

In order to troubleshoot the vast majority of RLS hang / wait / slowdown / loop situations, the following diagnostic commands will need to be issued:
- D GRS,C                    - ENQ contention (system level)
- D SMS,SMSVSAM,DIAG(C)      - RLS latch contention (system level)
- D SMS,SMSVSAM,QUIESCE      - Quiesce event status (system level)
- IDCAMS SHCDS LISTALL       - lists registered subsystems & lock info
- D SMS,CFLS(lock_structure) - displays lock structure information
- D XCF,STR,STRNM=[IGWLOCK00 | secondary_lock_structure]
                              - another display of the lock str

For all "system level" commands, ensure that they are issued on every system in the plex. The best way to accomplish this is to use the route command. For example, RO *ALL,D GRS,C

Once the commands have been issued, dump SMSVSAM around the plex. Be sure to including the DATASPACEs for RLS as well as a minimum of SDATA parms GRSQ & XESDATA. If any CICS regions are affected, ensure that they are added to the dump specification as well.

Here is an example command to dump RLS, XCF and a CICS region on one system:
DUMP COMM=(some meaningful dump title)
 R xx,JOBNAME=(SMSVSAM,XCFAS,CICS1),CONT
 R yy,DSPNAME=('SMSVSAM'.*,'XCFAS'.*),CONT
 R nn,SD=(COUPLE,PSA,NUC,SQA,LSQA,SUM,RGN,GRSQ,LPA,TRT,CSA,XESDATA),END
Adding the REMOTE keyword will issue the same dump command on each member in the plex:
DUMP COMM=(some meaningful dump title)
 R xx,JOBNAME=(SMSVSAM,XCFAS,CICS1),CONT
 R yy,DSPNAME=('SMSVSAM'.*,'XCFAS'.*),CONT
 R nn,SD=(COUPLE,PSA,NUC,SQA,LSQA,SUM,RGN,GRSQ,LPA,TRT,CSA,XESDATA),CONT
 R zz,REMOTE=(SYSLIST=(*('SMSVSAM')),DSPNAME,SDATA),END
The dumping process can be simplified by including an entry similar to the following example in the IEADMCxx PAMRLIB member:
JOBNAME=(*MASTER*,SMSVSAM,CICS1),DSPNAME=('SMSVSAM'.*),
SDATA=(COUPLE,PSA,NUC,SQA,LSQA,SUM,RGN,GRSQ,LPA,TRT,CSA,XESDATA),
REMOTE=(SYSLIST=(*('SMSVSAM')),DSPNAME,SDATA)

Once the member is created, issuing DUMP COMM=(title),PARMLIB=xx will dump RLS all around the plex.

Recovery actions

As with all RLS problems, please be sure to collect appropriate documentation before attempting to clear the issue. Without documentation, support will be unable to verify the cause of the problem.

FTP the following documentation to the IBM support center for further diagnostics:
  • DUMPs
  • OPERLOG (or SYSLOG from all plex systems)
  • LOGREC (from all systems)
  • JOBLOGS (any which may be pertinent)

To clear the issue, start by investigating the task/lock/latch that the diagnostic command indicate is holding up the system and then attempt to clear the specific resource by cancelling the affected job / transaction / region / or system. Info APAR II14597 provides a detailed step by step set of instructions for this and other common scenarios.

Actions to avoid recurrence