z/OS MVS Setting Up a Sysplex
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Planning sysplex availability and recovery

z/OS MVS Setting Up a Sysplex
SA23-1399-00

When work on one system in a sysplex cannot complete because the system fails, other systems in the sysplex remain available to recover the work and continue processing the workload. The goals of failure management in a sysplex are to minimize the impact that a failing system might have on the sysplex workload so that work can continue, and to do this with little or no operator intervention. You should be aware that in some cases sysplex delays may occur while other systems attempt to recover work from a failing system. See Lock structure considerations for examples of sysplex delays that may occur.

The actions MVS™ is to take in failure situations is determined by the information specified through the COUPLExx parmlib member, the SFM policy, the XCFPOLxx parmlib member, the automatic restart management policy, the use of the system status detection (SSD) partitioning protocol, and system defaults.
  • COUPLExx Parmlib Member

    From COUPLExx, MVS obtains basic failure-related information, such as when to consider a system to have failed and when to notify the operator of the failure. (COUPLExx parmlib specifications might be different for each system, depending on its workload, processor capacity, or other factors.)

  • SFM Policy

    If all systems in a sysplex are running OS/390® or MVS/ESA SP Version 5, you can use the sysplex failure management (SFM) policy to define how MVS is to handle system failures, signaling connectivity failures, or PR/SM™ reconfiguration actions. Although you can use SFM in a sysplex without a coupling facility, to take advantage of the full range of failure management capabilities that SFM offers, a coupling facility must be configured in the sysplex.

    SFM makes use of some information specified in COUPLExx and includes all the function available through XCFPOLxx.

    The SFM policy also can be used in conjunction with the REBUILDPERCENT specification in the CFRM policy to determine whether MVS should initiate a structure rebuild when loss of connectivity to a coupling facility occurs.

  • XCFPOLxx Parmlib Member

    In a multisystem sysplex on a processor with the PR/SM feature, XCFPOLxx functions can provide some of the same capabilities as those provided by the SFM policy. XCFPOLxx functions are also referred to as the XCF PR/SM policy.

  • Automatic Restart Management Policy

    Use the automatic restart management policy to specify how batch jobs and started tasks that are registered as elements of automatic restart management should be restarted. The policy can specify different actions to be taken when a system fails, and when an element fails. Automatic restart management uses the IXC_WORK_RESTART exit, the IXC_ELEM_RESTART exit, the event exit, and the IXCARM macro parameters, in conjunction with the automatic restart management policy (the specified values and the defaults) when determining how to restart elements.

  • System Status Detection (SSD) Partitioning Protocol Using BCPii
    XCF uses the SSD partitioning protocol and BCPii services to enhance and expedite sysplex partitioning processing of systems in the sysplex. With BCPii services, XCF can automatically detect when a system in the sysplex has become demised. Then XCF can initiate partitioning the demised system immediately, bypassing the failure detection interval and the cleanup interval and avoiding the need for system fencing and manual operator intervention. A system image is considered demised when XCF determines that the system is removable from the sysplex without further delay. The system might encounter one of the following conditions:
    • The system enters a non-restartable disabled wait state.
    • The system experiences a LOAD operation.
    • The system has experienced a RESET or other equivalent action (such as system reset, checkstop, and power-down).
  • System Default Status Update Missing (SUM) Action

    If no active SFM policy or PR/SM policy is defined, the default SUM action is used in response to a status update missing condition. Before z/OS® V1R11, the default SUM action is to prompt the operator when a system is detected to be status update missing. As of z/OS V1R11, if a system is in the status update missing condition and is not sending any XCF signals, the system is to be isolated immediately using the fencing services through the coupling facility.

Other sysplex and couple data set failures, such as those caused by a power failure, might require operator intervention. See Handling concurrent system and couple data set failures.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014