Configuring the CRIT_DAEMON_RESTART_GRACE_PERIOD tunable

The Reliable Scalable Cluster Technology (RSCT) subsystem consists of multiple daemons that provide various functions to PowerHA® SystemMirror®. Few RSCT daemons are marked as critical because PowerHA SystemMirror depends on these daemons to provide high availability. When RSCT critical daemons are not available, PowerHA nodes will be halted to avoid corruptions.

In case of system failure, the RSCT critical daemons such as the resource monitoring and control (RMC) subsystem and IBM.ConfigRM restart automatically.

To save the downtime that is required to halt and restart the node, you can use the CRIT_DAEMON_RESTART_GRACE_PERIOD tunable. RSCT will wait for the specified grace period to allow the RMC subsystem and IBM.ConfigRM daemons to restart without halting the node.

You can set the value of the CRIT_DAEMON_RESTART_GRACE_PERIOD tunable both at cluster level and node level of PowerHA SystemMirror. You can override the PowerHA SystemMirror CRIT_DAEMON_RESTART_GRACE_PERIOD tunable by using the RSCT configuration file (/etc/ctfile.cfg).

To set the value of the CritDaemonRestartGracePeriod tunable by using the /etc/ctfile.cfg RSCT configuration file, enter the following syntax:
CritDaemonRestartGracePeriod=<value in secs>
To set the value of the CRIT_DAEMON_RESTART_GRACE_PERIOD tunable at a node level or cluster level by using the clmgr command:

clmgr add cluster <clustername> CRIT_DAEMON_RESTART_GRACE_PERIOD=<value in secs>
clmgr modify cluster CRIT_DAEMON_RESTART_GRACE_PERIOD=<value in secs> 
clmgr add node <nodename> CRIT_DAEMON_RESTART_GRACE_PERIOD=<value in secs>
clmgr modify node <nodename> CRIT_DAEMON_RESTART_GRACE_PERIOD=<value in secs>
Note: For PowerHA SystemMirror, the supported range of the CRIT_DAEMON_RESTART_GRACE_PERIOD tunable is 0 - 240 seconds.
The CRIT_DAEMON_RESTART_GRACE_PERIOD tunable supports the following values:
0
The node is halted when the RMC subsystems or IBMConfigRm daemon fails. This is the default value.
>0
RSCT will wait for the specified grace period and if the RMC subsystems or IBMConfigRm daemon are not restarted within the specified grace period, the node will be halted.

In PowerHA SystemMirror, the value of the CRIT_DAEMON_RESTART_GRACE_PERIOD tunable set by the /etc/ctfile.cfg RSCT configuration file is given highest priority followed by node level and then the cluster level settings. If all three options are enabled, the value set in the /etc/ctfile.cfg RSCT configuration file value is consider as the grace period.

After you configure the cluster level or node level tunable values, you must run the verify and synchronization operation of the cluster to update the RSCT CritDaemonRestartGracePeriod tunable.

Based on the configuration settings specified by a user, few warning messages are displayed during configuration or verify and synchronization operation of the cluster.
Scenario 1
When you configures both cluster level and node level tunable for any specific node, the following warning message is displayed during verify and synchronization operation of the cluster:
Warning: Node level critical daemon restart grace period is configured for the following nodes: node1 node2. Node level configuration has highest priority then cluster level so the cluster level attribute is ignored for the specified nodes.
Scenario 2
When you configures CRIT_DAEMON_RESTART_GRACE_PERIOD tunable for cluster level, node level, and also sets the value in the /etc/ctfile.cfg RSCT configuration file, the /etc/ctfile.cfg RSCT configuration file is given the highest priority.
Warning: Critical daemon restart grace period is configured in the /etc/ctfile.cfg RSCT configuration file as x seconds on node1 and this value is considered as the highest priority. Hence the cluster level and node level values of the RSCT critical daemon restart grace period are ignored.
Scenario 3
During verify and synchronization operation of the cluster, the preferred value of the CRIT_DAEMON_RESTART_GRACE_PERIOD is displayed. If you sets the CRIT_DAEMON_RESTART_GRACE_PERIOD value for node level or cluster level to more than 120, the following warning message is displayed during verify and synchronization operation of the cluster:
Warning: Critical daemon restart grace period at node level is configured for the following node: node1 with 130 seconds. The preferred value is less than 120 seconds.