What is workload management?

For z/OS®, the management of system resources is the responsibility of the workload management (WLM) component. WLM manages the processing of workloads in the system according to the company's business goals, such as response time. WLM also manages the use of system resources, such as processors and storage, to accomplish these goals.

In simple terms, WLM has three objectives:

To achieve the business goals that are defined by the installation, by automatically assigning sysplex resources to workloads based on their importance and goals. This objective is known as goal achievement.
To achieve optimal use of the system resources from the system point of view. This objective is known as throughput.
To achieve optimal use of system resources from the point of view of the individual address space. This objective is known as response and turnaround time.

Goal achievement is the first and most important task of WLM. Optimizing throughput and minimizing turnaround times of address spaces come after that. Often, these latter two objectives are contradictory. Optimizing throughput means keeping resources busy. Optimizing response and turnaround time, however, requires resources to be available when they are needed. Achieving the goal of an important address space might result in worsening the turnaround time of a less important address space. Thus, WLM must make decisions that represent trade-offs between conflicting objectives.

To balance throughput with response and turnaround time, WLM does the following:

Monitors the use of resources by the various address spaces.
Monitors the system-wide use of resources to determine whether they are fully utilized.
Determines which address spaces to swap out (and when).
Inhibits the creation of new address spaces or steals pages when certain shortages of central storage exist.
Changes the dispatching priority of address spaces, which controls the rate at which the address spaces are allowed to consume system resources.
Selects the devices to be allocated, if a choice of devices exists, to balance the use of I/O devices.

Other z/OS components, transaction managers, and database managers can communicate to WLM a change in status for a particular address space (or for the system as a whole), or to invoke WLM's decision-making power.

For example, WLM is notified when:

Central storage is configured into or out of the system.
An address space is to be created.
An address space is deleted.
A swap-out starts or completes.
Allocation routines can choose the devices to be allocated to a request.

Up to this point, we have discussed WLM only in the context of a single z/OS system. In real life, customer installations often use clusters of multiple z/OS systems in concert to process complex workloads. Parallel Sysplex® is the term used to refer to clustered z/OS systems.

WLM is particularly well-suited to a sysplex environment. It keeps track of system utilization and workload goal achievement across all the systems in the Parallel Sysplex and data sharing environments. For example, WLM can decide the z/OS system on which a batch job should run, based on the availability of resources to process the job quickly.

A mainframe installation can influence almost all decisions made by WLM by establishing a set of policies that allow an installation to closely link system performance to its business needs. Workloads are assigned goals (for example, a target average response time) and an importance (that is, how important it is to the business that a workload meet its goals).

Before the introduction of WLM, the only way to inform z/OS about the company's business goals was for the system programmer to translate from high-level objectives into the extremely technical terms that the system can understand. This translation required highly skilled staff, and could be protracted, error-prone, and eventually in conflict with the original business goals.

Further, it was often difficult to predict the effects of changing a system setting, which might be required, for example, following a system capacity increase. This difficulty could result in unbalanced resource allocation, in which work is deprived of a critical system resource. This way of operating, called compatibility mode, was becoming unmanageable as new workloads were introduced, and as multiple systems were being managed together.

When in goal mode system operation, WLM provides fewer, simpler, and more consistent system externals that reflect goals for work expressed in terms commonly used in business objectives, and WLM and System Resource Manager (SRM) match resources to meet those goals by constantly monitoring and adapting the system. Workload Manager provides a solution for managing workload distribution, workload balancing, and distributing resources to competing workloads.

WLM policies are often based on a service level agreement (SLA), which is a written agreement of the information systems (I/S) service to be provided to the users of a computing installation. WLM tries to achieve the needs of workloads (response time) as described in an SLA by attempting the appropriate distribution of resources without over-committing them. Equally important, WLM maximizes system use (throughput) to deliver maximum benefit from the installed hardware and software platform.