Monitoring and tuning health management

Health management offers default settings that suit most environments. However, if you discover that your health controller is not working as expected, tune the default parameters.

Before you begin

Verify that you have proper security authorization in the console to modify these settings. Privileges for health policies differ, depending on the administrative role of the user. Roles include monitor, operator, configurator, and administrator. If you are a user with either a monitor or an operator role, you can only view health policy information. If you are a user with either a configurator or an administrator role, you have all configuration privileges for health policies.

About this task

Use the following steps to modify the health controller parameters. Tune these parameters when the health management infrastructure is not working the way that you want.

Procedure

In the administrative console click Operational policies > Autonomic managers > Health controller.
Determine whether you want your changes to be persistent or applied to the current runtime for testing purposes.
On the Configuration tab you can view the fields that are previously configured, and in some cases, you can edit these fields. On the Runtime tab, you can view the fields that are currently used by the health controller, and in some cases, make changes to these values. The values changed on the Runtime tab are sent directly to the health controller, and the controller parameters are modified. Because these changes are not stored in the repository by default, you can make temporary parameter changes.
Tip: Enter your changes on the Runtime tab and test the changes before committing them. Select Save to configuration on the Runtime tab, to make configuration changes and test them in the runtime. If you want to commit your changes, click Save to configuration.

Modify and test your settings.

Setting	Description
Control cycle length	Specifies the time between consecutive health checks. The value is specified in minutes and ranges from 1 to 60 minutes. Longer control cycles reduce the health monitoring load. The disadvantage is that health conditions that occur during that period are not detected until the next control cycle. For example, if you have a health policy with a workload condition of 10,000 requests associated to an application server and the value is specified as 60 minutes, the health controller checks every 60 minutes to determine if the application server has served 10,000 requests. If 9,999 requests are detected during a health check, and a new health check occurs after another 60 minutes (the control cycle length), the server actually serves more than 10,000 requests prior to a restart.
Maximum consecutive restarts	Specifies the number of attempts to revive an application server after a restart decision is made. If this number is exceeded, the assumption is that the operation failed and restarts are disabled for the server. The value must be a whole number between 1 and 5, inclusive.
Minimum restart interval	Controls the minimum amount of time that must elapse between consecutive restarts of an application server instance. If a health condition for an application server is breached during that time, the restart is set to a pending state. When the minimum restart interval passes, the restart occurs. The value can range from 15 minutes to 365 days, inclusive. A value of 0 disables the minimum restart value.
Restart timeout	Consists of the sequence of stop and start server actions. The restart timeout value specifies how long to wait -from the triggering of the health policy- to explicitly checking for a server stop. When the health policy is triggered, the restart timeout value goes into effect and a stop command is issued. After being issued, the state of the stop is not checked until the restart timeout is reached. At the end of the restart timeout, if the server has not stopped, an agressive stop is issued so that the server stops quickly, without draining sessions. A start command can then be executed to restart the server. If the length of time to start and stop an application server is unusually high, set this value so that the restart action does not time out. Always specify the value in minutes. The value can range from 1 minute to 60 minutes, specified as a whole number. The default value for restart timeout is 5 minutes.
Enable health monitoring	Enables or disables the operation of the health controller. When enabled, the health controller continuously monitors the health policies in the system. You can disable the health controller without removing the health policies from the system.
Prohibited restart times	Specifies the times and days of the week when a restart of an application server instance is prohibited. Specify the start and end times by selecting the hour and minute using a 24 hour clock, and by selecting the days of the week. You can specify multiple time blocks, if needed. If you specify a start time and end time, you must also specify at least one day of the week when these intervals are prohibited. The block between the start time and end time cannot cross the midnight boundary. If you need to specify a time block of, for example, 10:00 PM to 1:00 AM, you need to specify two time blocks, one from 22:00 to 23:59 and one from 00:00 to 01:00. Click Add to add additional time constraints. To remove an existing constraint, select the check box next to the constraint and click Remove. If the restart time breaches a health condition , the restart is delayed until the prohibited time interval passes.

Results

You have modified the health management configuration settings to tune your system.

What to do next

For more information about modifying the health management settings when they are not working as expected, read the troubleshooting information.