Maintaining an efficient monitoring environment

Monitoring, Version 6.2

Maintaining an efficient monitoring environment

This section covers the daily, weekly, monthly, and quarterly routine health checks on the Tivoli Monitoring V6.2 enterprise environment.

By performing these routine procedures in addition to your daily, weekly, monthly and quarterly health checks, you ensure that your Tivoli Monitoring environment continues to run smoothly.

Run the taudit.js tool which can be found in the Tivoli® Open Process Automation Library (OPAL) by searching for "Web SOAP scheduled reporting tools" or navigation code "1TW10TM0U." This tool provides an overall status of the environment. Run this tool every day.
Take a monitoring server backup every 24 hours in early stages and then move to weekly backups. If you have effective snapshot software, you can take backups with the monitoring server or portal server, or both online. Otherwise, shutdown the monitoring server and portal server before taking a backup. Test these backups after you first develop the process and at least twice a year thereafter by restoring to a monitoring server in your test environment to ensure the backups are successfully backing up the data you need to restore your production monitoring server in the event of an outage or need for rolling back to a previous state.
Make sure the portal server database backup is in the plan and is being made daily as the environment is being rolled out and then weekly as the environment matures and less frequent changes are made to the environment. Test these backups after you first develop the process and at least twice a year thereafter by restoring to a portal server in your test environment to ensure the backups are successfully backing up the data you need to restore your production portal server in the event of an outage or need to rollback to previous state.
Make sure the DB2® warehouse backup is in the plan and is being made weekly. The reason you need to do this weekly is because of huge database size.
Check daily that the warehouse agent is performing by looking at the warehouse logs (hostname_hd_timestamp-nn.log).
Check daily that the Summarization and Pruning agent is performing by looking at the (hostname_sy_timestamp-nn.log) logs.
Check the monitoring server (hostname_ms_timestamp-nn.log) and portal server logs (hostname_cq_timestamp-nn.log) for any obvious errors and exceptions.
Check that there are no monitoring servers overloaded with agents. One way to do this is by checking the "Self-Monitoring Topology" workspace, which has a "Managed Systems per TEMS" view showing the number of agents reporting to each monitoring server.
For DB2, run the REORGCHK and RUNSTATS on the warehouse database daily.
For DB2, run the REORGCHK and RUNSTATS on the portal server database weekly.
Check that events are reaching the Tivoli Enterprise Console server and also from the user created Universal Agents.
Check that all the fired situations are answered with a response and are not in open state for a long period of time.
Check that all the agents are responding by making SOAP down calls to each agent. Running taudit.js (as mentioned above) checks this automatically.
Check the core components process memory and CPU usage and that you have situations created to monitor them.

Feedback

[ Top of Page | Previous Page | Next Page | Contents | Index ]