z/OS concepts
Previous topic | Next topic | Contents | Glossary | Contact z/OS | PDF


Benefits of Parallel Sysplex: Ease of use

z/OS concepts

The Parallel Sysplex® solution satisfies a major customer requirement for continuous 24-hour-a-day, 7-day-a-week availability, while providing techniques for achieving simplified Systems Management consistent with this requirement. Some of the features of the Parallel Sysplex solution that contribute to increased availability also help to eliminate some Systems Management tasks.

Workload management (WLM) component
The workload management (WLM) component of z/OS® provides sysplex-wide workload management capabilities based on installation-specified performance goals and the business importance of the workloads. WLM tries to attain the performance goals through dynamic resource distribution. WLM provides the Parallel Sysplex cluster with the intelligence to determine where work needs to be processed and in what priority. The priority is based on the customer's business goals and is managed by sysplex technology.
Sysplex Failure Manager (SFM)
The Sysplex Failure Management policy allows the installation to specify failure detection intervals and recovery actions to be initiated in the event of the failure of a system in the sysplex.

Without SFM, when one of the systems in the Parallel Sysplex fails, the operator is notified and prompted to take some recovery action. The operator may choose to partition the non-responding system from the Parallel Sysplex, or to take some action to try to recover the system. This period of operator intervention might tie up critical system resources required by the remaining active systems. Sysplex Failure Manager allows the installation to code a policy to define the recovery actions to be initiated when specific types of problems are detected, such as fencing off the failed image that prevents access to shared resources, logical partition deactivation, or central storage and expanded storage acquisition, to be automatically initiated following detection of a Parallel Sysplex failure.

Automatic Restart Manager (ARM)
Automatic Restart Manager enables fast recovery of subsystems that might hold critical resources at the time of failure. If other instances of the subsystem in the Parallel Sysplex need any of these critical resources, fast recovery will make these resources available more quickly. Even though automation packages are used today to restart the subsystem to resolve such deadlocks, ARM can be activated closer to the time of failure.

ARM reduces operator intervention in the following areas:

  • Detection of the failure of a critical job or started task
  • Automatic restart after a started task or job failure

    After an abend of a job or started task, the job or started task can be restarted with specific conditions, such as overriding the original JCL or specifying job dependencies, without relying on the operator.

  • Automatic redistribution of work to an appropriate system following a system failure

    This removes the time-consuming step of human evaluation of the most appropriate target system for restarting work

Cloning and symbolics
Cloning refers to replicating the hardware and software configurations across the different physical servers in the Parallel Sysplex. That is, an application that is going to take advantage of parallel processing might have identical instances running on all images in the Parallel Sysplex. The hardware and software supporting these applications could also be configured identically on all systems in the Parallel Sysplex to reduce the amount of work required to define and support the environment.

The concept of symmetry allows new systems to be introduced and enables automatic workload distribution in the event of failure or when an individual system is scheduled for maintenance. It also reduces the amount of work required by the system programmer in setting up the environment. Note that symmetry does not preclude the need for systems to have unique configuration requirements, such as the asymmetric attachment of printers and communications controllers, or asymmetric workloads that do not lend themselves to the parallel environment.

System symbolics are used to help manage cloning. z/OS provides support for the substitution values in startup parameters, JCL, system commands, and started tasks. These values can be used in parameter and procedure specifications to allow unique substitution when dynamically forming a resource name.

zSeries® resource sharing
A number of base z/OS components have discovered that the IBM® coupling facility shared storage provides a medium for sharing component information for the purpose of multisystem resource management. This exploitation, called IBM zSeries Resource Sharing, enables sharing of physical resources such as files, tape drives, consoles, and catalogs with improvements in cost, performance and simplified systems management. This is not to be confused with Parallel Sysplex data sharing by the database subsystems. zSeries Resource Sharing delivers immediate value even for customers who are not leveraging data sharing, through native system exploitation delivered with the base z/OS software stack.

One of the goals of the Parallel Sysplex solution is to provide simplified systems management by reducing complexity in managing, operating, and servicing a Parallel Sysplex, without requiring an increase in the number of support staff and without reducing availability.





Copyright IBM Corporation 1990, 2010