Hard storage errors

This topic deals with these types of hard storage errors:

When a hard storage error occurs, the operating system attempts recovery. For a storage key problem in a frame containing a virtual page, the operating system tries to reset the key. If the reset fails and the page is not fixed, the operating system moves the page to a new fame, setting the key in the new frame as required.

If recovery cannot repair the error, the operating system either takes the storage frame offline or marks it pending offline. Pending offline means that the operating system will take the frame offline when the frame becomes free.

A storage error uncorrected condition represents the potential loss of critical data. When this condition occurs with a PD machine check, the system in most cases ends the affected unit of work. If the recovery routines complete successfully so that the affected storage frame is freed, the frame is marked offline and system processing continues. The recovery processing, however, could try to refer to the storage that originally caused the machine check, thus causing further errors. Such action could result in the PD threshold for machine checks being reached, thus taking a CPU offline.

The default threshold for PD machine checks is 16 in 5 minutes. The operator can change this threshold by means of the MODE operator command.