Health management

With the health management feature in Liberty, you can take a policy-driven approach to monitoring the application server environment and respond when unhealthy criteria are discovered.

You can define the health policies, which include the health conditions to be monitored in your environment and the health actions to take if these conditions are met.

Health conditions

Health conditions define the variables that you want to monitor in your environment. The condition element defines what behavior can trigger this health policy. Only one condition element can be defined per health policy. You can choose from the following predefined health conditions:

Excessive request timeout condition
Specifies a percentage of HTTP requests that can time out. When the percentage of requests exceeds the defined value, the health actions run. The timeout value depends on your environment configuration.
<excessiveRequestTimeout timeoutPercentage="5"/>
Excessive response time condition
Tracks the average amount of time that requests take to complete. If the time exceeds the defined response time threshold, the health actions run.
<excessiveResponseTime responseTime="10s"/>
Note: Requests that exceed the timeout value that is configured for the excessive request timeout condition are not counted toward this health condition. For example, if the default timeout value is 60 seconds, then any request that exceeds 60 seconds times out and is not included in the average response time calculation. This restriction applies even if you do not define an excessive request timeout condition.
Memory condition: excessive memory usage
Tracks the memory usage for a member. When the memory usage exceeds a percentage of the heap size for a specified time, health actions run.
<excessiveMemoryUsage heapSizePercentage="85" timePeriod="5m"/>
Memory condition: memory leak
When a downward trend in free memory is detected, health actions run.
<memoryLeak/>
Important:
  • Dynamic Routing must be enabled to use either the excessive request timeout or excessive response time conditions.
  • The healthAnalyzer-1.0 feature must be enabled in your server.xml file to use either the excessive memory usage or memory leak conditions. This feature can be enabled only for collective members.

Health actions

Health actions define the activities to perform when a health condition is not met. Action elements define what action is taken in response to a detected condition. All actions share the element type of <action>. The action attribute determines which action is taken and multiple actions can be defined for each health policy. Actions are run in the order they are specified in the policy. The following table lists the health actions that are supported in Liberty server environments:

Table 1. Predefined health action support for Liberty servers
Health action Liberty servers that run in the same collective controller
Restart server. Supported
Take thread dumps. Supported
Take Java™ virtual machine (JVM) heap dumps. Supported for servers that are running on the IBM® JRE or Java Developer Kit
Enter server into maintenance mode. Supported
Exit server out of maintenance mode. Supported
<action action="generateThreadDump"/>
<action action="generateHeapDump"/>
<action action="restartServer"/>
<action action="enterMaintenanceMode"/>
<action action="exitMaintenanceMode"/>

Health targets

Target elements define the scope of the topology that is being monitored for the condition. Three target types are available:
  • A host
    <host hostName="someHost"/>
  • Each of the servers in a cluster
    <cluster clusterName="someCluster"/>
  • A single server
    <server hostName="Host" wlpUsrDirectory="/opt/ibm/liberty/wlp" serverName="Server"/>

Each target type has a unique element that is used to define it within the healthPolicy element. More than one target can be specified per health policy.