Health management
With the health management feature in Liberty, you can take a policy-driven approach to monitoring the application server environment and respond when unhealthy criteria are discovered.
You can define the health policies, which include the health conditions to be monitored in your environment and the health actions to take if these conditions are met.
Health conditions
Health conditions define the variables that you want to monitor in your environment. The condition element defines what behavior can trigger this health policy. Only one condition element can be defined per health policy. You can choose from the following predefined health conditions:
- Excessive request timeout condition
- Specifies a percentage of HTTP requests that can time out. When the percentage of requests
exceeds the defined value, the health actions run. The timeout value depends on your environment
configuration.
<excessiveRequestTimeout timeoutPercentage="5"/>
- Excessive response time condition
- Tracks the average amount of time that requests take to complete. If the time exceeds the
defined response time threshold, the health actions
run.
<excessiveResponseTime responseTime="10s"/>
Note: Requests that exceed the timeout value that is configured for the excessive request timeout condition are not counted toward this health condition. For example, if the default timeout value is 60 seconds, then any request that exceeds 60 seconds times out and is not included in the average response time calculation. This restriction applies even if you do not define an excessive request timeout condition. - Memory condition: excessive memory usage
- Tracks the memory usage for a member. When the memory usage exceeds a percentage of the heap
size for a specified time, health actions
run.
<excessiveMemoryUsage heapSizePercentage="85" timePeriod="5m"/>
- Memory condition: memory leak
- When a downward trend in free memory is detected, health actions
run.
<memoryLeak/>
- Dynamic Routing must be enabled to use either the excessive request timeout or excessive response time conditions.
- The
healthAnalyzer-1.0
feature must be enabled in yourserver.xml
file to use either the excessive memory usage or memory leak conditions. This feature can be enabled only for collective members.
Health actions
Health actions define the activities to perform when a health condition is not met. Action
elements define what action is taken in response to a detected condition. All actions share the
element type of <action>
. The action attribute determines which action is taken
and multiple actions can be defined for each health policy. Actions are run in the order they are
specified in the policy. The following table lists the health actions that are supported in Liberty server environments:
Health action | Liberty servers that run in the same collective controller |
---|---|
Restart server. | Supported |
Take thread dumps. | Supported |
Take Java™ virtual machine (JVM) heap dumps. | Supported for servers that are running on the IBM® JRE or Java Developer Kit |
Enter server into maintenance mode. | Supported |
Exit server out of maintenance mode. | Supported |
<action action="generateThreadDump"/>
<action action="generateHeapDump"/>
<action action="restartServer"/>
<action action="enterMaintenanceMode"/>
<action action="exitMaintenanceMode"/>
Health targets
- A host
<host hostName="someHost"/>
- Each of the servers in a
cluster
<cluster clusterName="someCluster"/>
- A single
server
<server hostName="Host" wlpUsrDirectory="/opt/ibm/liberty/wlp" serverName="Server"/>
Each target type has a unique element that is used to define it within the
healthPolicy
element. More than one target can be specified per health policy.