Basic Spectrum Control / TPC Server Health Check
How do I perform a basic health check of my Spectrum Control / TPC server?
Normal operation, administration, preventive maintenance and planning
**Note: this health check is focused on the server and software itself, and is not focused on the health of the devices that are being monitored, which is beyond the scope of this document.
Periodic server health checks are an important part of maintaining a smoothly functioning environment. It is also an important part of disaster recovery readiness.
It is particularly important to conduct a health check prior to major configuration changes or upgrades, as a poorly functioning server environment is likely to lead to errors, failure, and instability (server hangs, needing frequent reboots, etc.).
Do you have a document that has a record of your critical server configuration information? This is vital information for the team administering and maintaining the application, and for situations requiring support assistance. Examples of information you should document:
1) user accounts and passwords (login, host authentication, DB2 admin, common user, WAS admin console, JazzSM/TCR admin user)
2) deviations from a standard/default install (different install path/drive for the application and/or DB2, multi-server install details, etc.)
SERVER PLATFORM SPECIFICATIONS
When the application was installed, the requirements for the server - OS version, memory, number of CPUs and processor speed, web browser version, DB2 version, etc. - should be at supported levels according to the supported products and platforms document (link below). Review this document for your software version and verify that your server meets the requirements that are listed.
1) Are you able to login with your administrator and user accounts?
2) Are passwords for DB2 admin, service and common user accounts set to never expire -OR- proper controls/procedures documented and followed to change/update passwords before they expire?
<TPC> = Spectrum Control / TPC installation home directory
Review most recent server logs for error messages that identify problems that need to be resolved:
1) Data server (most recent <TPC>/data/log/server_xxxxxx.log, TPCD_xxxxxx.log, Scheduler_xxxxxx.log files)
2) Device server (<TPC>/device/log/msgTPCDeviceServer.log, traceTPCDeviceServer.log, dmSvcTrace.log files)
Check the server directories for old/large logs, dumps, etc. that can be deleted to save space. Refer to "Related Information" below for links to technotes on these topics.
DB2 HEALTH AND RECOVERABILITY
1) Do you take regular backups of the database? Check to make sure you have a current/recent backup and that scheduled backups are taking place as planned.
2) Are the DB2 services up and running? Is the DB2 TCPIP port (usually port 50000) present in a netstat command output and in LISTENING state?
3) If you have the DB2 Control Center available (DB2 v9.7 and older), use the Health Center to check for alert conditions needing attention, and use the 'Recommendation Advisor' for guidance on remedies.
4) Consider installing and using IBM Data Studio for DB2 v10.x and newer versions for access to tools equivalent to the DB2 Control Center Health Center.
5) Locate and scan the 'db2diag.log' file for error messages and conditions requiring action/attention. **Note: if this file is very large, consider running a 'db2support' command to capture the current log, and then run 'db2diag -A' to archive the current log and initialize a new one.
OVERALL SERVER HEALTH
1) Do you have backup software running on your server to backup the server for disaster recovery? Check to make sure you have a current/recent backup of the server that can be used for recovery, and that scheduled backups are taking place as planned.
2) Check your system disks/filesystems for adequate disk space (i.e., OS: C:, /root, /tmp filesystems, application: application install disk/filesystem, DB2 install/database disk/filesystem)
3) Check system and application event/error logs, errpt/syslog for error messages or conditions needing action/attention.
|Storage Management||Tivoli Storage Productivity Center Advanced||AIX, Linux, Windows|