Use the following list to perform daily health checks.
Tasks
Make sure all of the systems from the day before are still online. A quick
look at the Managed System Status workspace shows the status of each managed
system. If you encounter managed systems that are offline investigate them
individually.
There are several reasons why a managed system can go offline.
The agent might have gone offline for some reason, there may be communication
problems between the agent and the monitoring server that it is connected
to or the agent was decommissioned. In any case the cause of the problem must
be found and addressed. Run a script every morning that provides a report
on ONLINE and OFFLINE systems, Taudit.js can be
used for this purpose.
You might find situations that are in open status that have not been addressed
(acknowledged). Determine if the problem reported by the situation is valid.
Determine if there is really a problem or is it a case where the situation
does not have the correct thresholds and is producing a false positive that
is being ignored by the monitoring staff. Make sure your situations are reflecting
real events, which helps train the monitoring staff to react to each event
that goes true in the Tivoli Monitoring environment.
If you have decided to collect historical data and are using the Tivoli Data Warehouse,
make sure the Warehouse Proxy agent and Summarization and Pruning agents are up and running. Check the
logs for both to make sure the agents are collecting and summarizing data
on the intervals you have set. To find a solution that allows you to monitor
the warehouse activity to ensure that it is functioning properly, search for "Data
Warehouse DB activity" or navigation code "1TW10TM1X" in the TivoliĀ® Open Process Automation Library (OPAL).
Spot-check the workspaces for several different monitoring agent types
to make sure report data is being returned.