The IBM® Health Checker
for z/OS® is a component of MVS™ that provides
the framework for checking z/OS system
and sysplex configuration parameters and the system environment to
help determine places where an installation is deviating from suggested
settings or where there might be configuration problems. IBM provides a set of check routines
in IBM Health Checker for z/OS,
but vendors, consultants, and system programmers can add other check
routines.
The objective of a check is to identify potential problems before
they impact your availability or, in worst cases, cause outages. The
output of a check is messages and reports that help an installation
analyze the health of a system.
You can use checks to look for things like:
- Changes in configuration values that occur dynamically over the
life of an IPL. Checks that look for changes in these values should
run periodically to keep the installation aware of changes accruing
since the last IPL, to help ensure a cleaner IPL the next time.
- Threshold levels approaching the upper limits, especially those
that might occur gradually or insidiously.
- Single points of failure in a configuration.
- Unhealthy combinations of configurations or values that an installation
might not think to check.
- Monitoring checks that create reports of collected data.
A check routine does the following:- Defines the severity of exceptions it finds and suggests a fix
for the exception.
- Defines a timer interval for the check.
- May have default values overridden by installation updates.
- Communicates check results by issuing messages to a buffer associated
with the check.
The following are examples of situations customers uncovered
running IBM Health Checker for z/OS at different times:
- Configuration abnormalities in what was believed to be a stable
system.
- Unexpected values on a system. Investigation revealed changes
had been correctly made to that system, but not replicated on other
systems.
- Default configurations that were never optimized for performance.
- Outdated settings that didn't support all current applications.
- Mismatched naming conventions that could have led to an outage.
- Dynamic changes accruing over the life of the IPL that can cause
problems.
Hints for planning your checks:
- Keep in mind that each check should only check for one thing.
This will make it much easier for the installation to resolve exceptions
that the check finds and override defaults.
- If you are writing a check that will flag a default or common
valid configuration setting as an exception, you should:
- Make sure that the HZSADDCHECK exit routine for your check specifies
the INACTIVE parameter on the HZSADDCK macro. INACTIVE specifies that
the check should not run until the installation changes the state
to active. See Writing an HZSADDCHECK exit routine and HZSADDCK macro — HZS add a check.
- Include information in your check output messages about why the
check user is getting an exception message for a default or common
valid setting.
Look for great information on writing checks in our Redpaper™: There's lots of
great experience-basecd information on writing checks in Redpaper Exploiting the Health Checker
for z/OS infrastructure (REDP-4590-00).