Monitoring your remote sharing environment

An RRSF network is a complex system. It is composed of many elements:

The RACF® subsystem with all its restartable functions
RRSF definitions and sequences
VSAM workspace files
RRSFLIST command output files
The APPC and TCP/IP connections between nodes

You must plan carefully to correctly implement an RRSF network. Two IBM® Redbooks® can help with this planning:

RACF Version 2 Release 2 Installation and Implementation Guide
RACF Version 2 Release 2 Technical Presentation Guide

Note: These IBM Redbooks contain information only about the APPC protocol; they do not discuss the TCP/IP protocol.

Monitoring the RRSF environment is a recommended practice for maintaining a healthy network. An RRSF environment is composed of many components and can have many physical nodes. At any time, there might be nodes that are not operational because of scheduled maintenance or a known problem that is being addressed. Only the professionals that are charged with maintaining the RRSF network can determine if messages or command results are as expected or if they indicate a problem that must be investigated and resolved.

Some monitoring approaches to consider are:

Periodically issue the TARGET LIST command to determine that the nodes you expect to be operative are in fact operative. Additionally, look for unexpected messages sent to the operator's console that indicate whether a connection's state has changed to an error state. For example, IRRC022I or IRRC033I indicate that the state changed to Operative Error or IRRC032I indicates Dormant Error.
Periodically issue the TARGET LIST NODE(node name) command for each node to check the status of the workspace data sets, For example, look for the number of records in the data sets. If the number is excessive, the data sets can fill up. If they fill up, requests might be rejected and database inconsistencies might occur. Further, look for messages indicating a problem with the workspace data sets. For example, IRRC029I and IRRC030I indicate problems in trying to write to workspace data sets and IRRC031I indicates that a workspace data set is full. If a workspace data set fills up, see Recovery procedures for more information.
If you use automatic direction, enter the SET command with the OUTPUT option to put the output (at least FAIL output) for automatically directed commands, automatically directed passwords, synchronized passwords, or automatically directed application updates into the RRSFLIST data set of the user responsible for maintaining RRSF. You should check the RRSFLIST data set periodically for unexpected results. Also, users must maintain their own RRSFLIST data set. To prevent it from filling up, move any results you need to another file and delete the contents. If the RRSFLIST data set fills up, output is sent by way of TSO TRANSMIT to that user.
Guideline: Use the SET command with the NOTIFY option to specify at least one backup user to receive notification of whether the RRSF request is successful, in the event that the primary administrator is unavailable. If the users who should receive the RRSF command output or who receive notification are not logged on, significant storage could be consumed over time, because the output or results are queued for delivery or receipt when the user logs on. This storage consumption could result in additional system problems.
If explicit command direction (AT, ONLYAT) is commonly used, check the RRSFLIST data set of the command issuer for unexpected results.
If you use TCP/IP, periodically check the AT-TLS encryption level to ensure that has not been inadvertently weakened by a change to the AT-TLS policy. The negotiated cipher is displayed in message IRRI027I when a successful TCP/IP connection is established between two nodes, and is also displayed in the detailed TARGET LIST output for a node connected by way of TCP/IP.

Steps such as these allow for timely identification of problems you can correct before they become critical. See z/OS Security Server RACF Diagnosis Guide for detailed information about the setup necessary for RRSF and the errors possible with workspace data sets, APPC communications, and RRSF definitions and sequences. See Failures in the RACF subsystem address space for error recovery information. Understanding the kinds of problems that can occur is a first step in deciding on the procedures necessary to detect and handle them.