IBM Tivoli Monitoring uses
a heartbeat mechanism to monitor the status of remote monitoring servers
and monitoring agents.
The different monitoring components in the monitoring architecture
form a hierarchy (shown in Figure 1)
across which the heartbeat information is propagated.
The hub monitoring server maintains
status for all monitoring agents.
Remote monitoring servers
offload processing from the hub monitoring server by
receiving and processing heartbeat requests from monitoring agents,
and communicating only status changes to the hub monitoring server.
Figure 1. Hierarchy for the heartbeat interval
At the highest level, the hub monitoring server receives
heartbeat requests from remote monitoring servers
and from any monitoring agents
that are configured to access the hub monitoring server directly
(rather than through a remote monitoring server).
The default heartbeat interval used by remote monitoring servers
to communicate their status to the hub monitoring server is
3 minutes. The default heartbeat interval of 3 minutes for monitoring servers
is suitable for most environments, and should not need to be changed.
If you decide to modify this value, carefully monitor the system behavior
before and after making the change.
At the next level, remote monitoring servers
receive heartbeat requests from monitoring agents
that are configured to access them. The default heartbeat interval
used by monitoring agents
to communicate their status to the monitoring server is
10 minutes.
You can specify the heartbeat interval for a node (either a
remote monitoring server or
a remote
monitoring agent)
by setting the
CTIRA_HEARTBEAT environment
variable. For example, specifying
CTIRA_HEARTBEAT=5 sets
the heartbeat interval to 5 minutes. The
minimum heartbeat
interval that can be configured is 1 minute.
- For monitoring servers
on Windows computers, you
can set this variable by adding the entry to the KBBENV file. You
can access this file from the Manage Tivoli Enterprise Monitoring Services utility
by right-clicking Windows OS
Monitoring Agent and clicking Advanced -> Edit ENV File.
Note that you must stop and restart the monitoring server for
the changes to the KBBENV file to take effect.
- For monitoring servers
on Linux and UNIX computers, you can set the CTIRA_HEARTBEAT variable
by adding the entry to the monitoring server configuration
file. The name of the monitoring server configuration
file is of the form hostname_ms_temsname.config.
For example, a remote monitoring server named REMOTE_PPERF06 running
on host pperf06 has a configuration filename of pperf06_ms_REMOTE_PPERF06.config.
Note that you must stop and restart the monitoring server for
the configuration changes to take effect.
- For remote monitoring servers,
you can set this variable by adding an entry to the KBBENV file. You
can access this file from Manage Tivoli Enterprise Monitoring Services by
right-clicking Tivoli Enterprise Monitoring Server and
clicking Advanced → Edit ENV File. You must stop and restart
the monitoring server for
changes to the KBBENV file to take effect.
- For Windows OS agents,
you can set this variable by adding the entry to the KNTENV file.
You can access this file from Manage Tivoli Enterprise Monitoring Services by
right-clicking Windows OS
Monitoring Agent and clicking Advanced → Edit ENV File.
You must stop and restart the monitoring agent for
the changes to the KNTENV file to take effect.
- For agents on Linux and UNIX computers, you can set the CTIRA_HEARTBEAT variable
by adding an entry to the agent .ini file (for example, lz.ini, ux.ini,
ua.ini). When the agent is stopped and restarted, the agent configuration
file is recreated using settings in the .ini file.
When a monitoring agent becomes
active and sends an initial heartbeat request to the monitoring server,
it communicates the desired heartbeat interval for the agent in the
request. The monitoring server stores
the time the heartbeat request was received and sets the expected
time for the next heartbeat request based on the agent heartbeat interval.
If no heartbeat interval was set at the agent, the default value is
used.
Changes to offline status typically require two missed heartbeat
requests for the status to change. Offline status is indicated by
the node being disabled in the portal client's
Navigator View. If the heartbeat interval is set to 10 minutes, an
offline status change would be expected to take between 10 and 20
minutes before it is reflected on the portal client's
Navigator View.
Attention: Lower heartbeat intervals increase CPU utilization
on the monitoring servers
processing the heartbeat requests. CPU utilization is also affected
by the number of agents being monitored. A low heartbeat interval
and a high number of monitored agents could cause the CPU utilization
on the monitoring server to
increase to the point that performance related problems occur. If
you reduce the heartbeat interval, you must monitor the resource usage
on your servers. A heartbeat interval lower than 3 minutes is not
supported.