Restarting DFSMShsm after an abnormal end

DFSMShsm automatically recovers after most abnormal terminations. However, if DFSMShsm could not recover from the abnormal end and the RESTART keyword was used in the PROC statement of the startup procedure, DFSMShsm will restart itself. The RESTART keyword allows you to specify a startup procedure—which can be the same or different from the original startup procedure—and to pass additional parameters to DFSMShsm. RESTART is only valid in an ESA environment.

Should an abnormal end occur that interrupts MVS™ processing, DFSMShsm can process any waiting requests as long as the extended common service area (ECSA) is not disturbed. When DFSMShsm is restarted, it will process any requests that are waiting in the ECSA.

Recovery from an abnormal end could possibly extend long enough to cause checking for the need to restart to occur outside the automatic processing window for migration, backup, or dump. Should such an incident occur and should you want the automatic function to restart, issue one of the following commands (as appropriate for the function that you want to start):
SETSYS PRIMARYSPMGMTSTART (starttime,endingtime)
 
SETSYS SECONDARYSPMGMTSTART (starttime,endingtime)
 
SETSYS AUTOBACKUPSTART (starttime,latestarttime,quiescetime)
 
SETSYS AUTODUMPSTART (starttime,latestarttime,quiescetime)

Leave starttime as it is for normal operations. Change endingtime, latestarttime and quiescetime to be later than the time that DFSMShsm is restarted. Starttime specifies the planned start time. Endingtime or quiescetime determine when DFSMShsm stops processing new volumes for the automatic function being processed. Because you do not change the starttime specification, DFSMShsm recognizes that you are not defining a new start window for the function. After the function has completed, you can issue a new SETSYS command to reset endingtime, latestarttime, and quiescetime to their former values.

If a DFSMShsm host fails in a sysplex environment where secondary host promotion is enabled, the unique functions of a primary host, or the secondary space management processing of any host are eligible to be taken over by a promoted host, when certain conditions are met.

If you are using the common recall queue (CRQ) when a host fails, all of the host's recall requests on the CRQ remain intact. They are eligible for processing by other active hosts, and the coupling facility notifies the remaining connected hosts of the failure. Also, requests that were in-process remain on the CRQ and are available for restart by other hosts.

Related reading

For more information about secondary host promotion, see z/OS DFSMShsm Implementation and Customization Guide.