Recommendations and recovery considerations for remote checks
Recovery needed for your check routine is basically the same as for any other program - the following recommendations are not, for the most part, unique to writing a check routine.
Make your check clean up after itself, because the system won't do it for you: IBM Health Checker for z/OS does not perform any end-of-task cleanup for your check. Check routines should track resources, such as storage obtained, ENQs, locks, and latches, in the PQE_ChkWork field.
- Issue an information message to describe why the check is not
running. For example, you might issue the following message to let
check users know that the environment is not appropriate for the check,
and when the check will run again:
The server is down. When the server is available, the check will run again.
- Issue the HZSFMSG service to stop itself:
HZSFMSG REQEST=STOP,REASON=ENVNA
- Make sure that your product or check includes code that can detect
a change in the environment and start running the check again when
appropriate. To start running the check, issue the following HZSCHECK
service:
If the environment is still not appropriate when your code runs the check, it can always stop itself again.HZSCHECK REQUEST=RUN,CHECKOWNER=checkowner,CHECKNAME=checkname
- Delete checks that do not apply in the current environment
- Run a check so that it can check the environment and disable itself if it is inappropriate in the current environment. Consider supporting a check PARM so the installation may indicate the condition is successful and not an error.
HZSFMSG REQUEST=STOP,REASON=BADPARM
This
request will also issue predefined HZS1001E error message to indicate
what the problem is. The check routine will not be called again until
it is refreshed or its parameters are changed. REQUEST=STOP prevents
the check from running again and sets the results in the PQE_Result
field of HZSPQE. The system sets the result field based on the severity
value for the check. See Issuing messages in your local check routine with the HZSFMSG macro for examples
and complete information.Take advantage of verbose and debug modes in your check:
- Debug mode, which tells the system to output extra messages designed
to help you debug your check. IBM Health Checker for z/OS outputs
some extra messages in debug mode, and some checks do also. When a
check runs in debug mode, each message line is prefaced by a message
ID, which can be helpful in pinpointing the problem. For example,
report messages are not prefaced by message IDs unless a check is
running in debug mode. There are two ways to issue extra messages in debug mode:
- Use conditional logic such that when in debug mode (when field PQE_DEBUG in mapping macro HZSPQE has the value PQE_DEBUG_ON), your check issues additional messages.
- Code debug type messages - see Planning your debug messages
Users can turn on debug mode using the DEBUG=ON parameter in the MODIFY hzsproc command, in HZSPRMxx, or by overtyping the DEBUG field in SDSF to ON.
- Verbose mode, which tells the check routine to output messages
with additional detail about non-exception information found by the
check. (RACF checks, for example, issue additional detail in verbose
mode.) To issue extra messages in verbose mode, use conditional
logic such that when in verbose mode (when field PQE_VERBOSE
in mapping macro HZSPQE has the value PQE_VERBOSE_YES), your check
issues additional messages.
Users can turn on verbose mode using the VERBOSE=YES parameter in the F hzsproc command or in HZSPRMxx.
Plan recovery for your check: Your check routine should be designed to handle abends. If the task that issues the HZSADDCK macro defining check defaults terminates for any reason, including an abend that is not re-tried, the system treats the check as if it is deleted.
- Retry the check a pre-determined number of times.
- If the check fails again, the check should stop running, but not stop itself.
Look for logrec error records when you test your check: When testing your check, be sure to look for logrec error records. The system issues abend X'290' if the system encounters an error while a message is being issued, and issues a logrec error record and a description of the problem in the variable recording area (VRA).
F hzsproc,UPDATE,CHECK(check_owner,check_name),DEBUG=ON
F hzsproc,UPDATE,CHECK(check_owner,check_name),PARM=parameter,REASON=reason,DATE=date
F hzsproc,DELETE,CHECK(check_owner,check_name),FORCE=YES
F hzsproc,DISPLAY,CHECK(check_owner,check_name),DETAIL
Avoid modifying system control blocks in your check routine: The IBM Health Checker for z/OS philosophy is to keep check routines very simple. IBM® recommends that checks read but not update system data and try to avoid disruptive behavior such as modifying system control blocks.
See also Debugging checks.