IZ85924: WATCHDOG CONTINUES TO RESTART OS AGENT
Fixes are available
IBM Tivoli Monitoring: Unix(R) OS Agent 18.104.22.168-TIV-ITM_UNIX-IF0007
IBM Tivoli Monitoring 6.2.2 Fix Pack 8 (6.2.2-TIV-ITM-FP0008)
IBM Tivoli Monitoring 6.2.2 Fix Pack 9 (6.2.2-TIV-ITM-FP0009)
IBM Tivoli Monitoring: Linux(R) OS Agent 22.214.171.124-TIV-ITM_LINUX-IF0003
IBM Tivoli Monitoring 6.2.2 Fix Pack 6 (6.2.2-TIV-ITM-FP0006)
Closed as program error.
Severity: 2 Approver: sm Reported Release:622 Compid: 5724C040U Tivoli OMEGAMON XE for UNIX Abstract: Watchdog continues to restart OS agent Environment: TEMS/TEPS -- AIX 6.1 -- ITM 6.2.2 FP02 TEMA -- AIX V6.1 -- ITM 6.2.2 FP02 Unix OS Agent TEMA is a HACMP and there are two kuxagent ( the customer received TCT and was advised it is supported now ) and both agents are experiencing this problem. Problem Description: cinfo -r/R does not show "running" because of the system's name resolution setting (the result of hostname command and hostname in RunInfo file do not match) Watchdog is using cinfo -r or -R to check for running agent. Normally if "cinfo -r" indicates the agent is not running, the watchdog will try to start it. If however, a PID has been collected indicating that the process is running but "cinfo -r" indicates it is not running, watchdog treats this as an "unhealthy" agent and before starting it will try to stop it first. So, it continues to start the OS agent.In the end, OS agent is not running any more. Detailed Recreation Procedure: Change name resolution setting the result of hostname command and hostname in RunInfo file do not match). Related Files and Output: #./cinfo -r *********** Wed Aug 4 16:26:00 JST 2010 ****************** User: root Groups: system bin sys security cron audit lp sapinst ncoadmin Host name : dcssap1 Installer Lvl:06.22.02.00 CandleHome: /opt/tivoli/itm *********************************************************** Host Prod PID Owner Start ID ..Status dcssap0 ux 897176 None Watchdog is using cinfo -r or -R to check for running agent. Since it does not get the correct status from cinfo -r/-R, Watchdog continues to restart the agent. In the end, OS Agent is not running anymore.
When the Agent Management Services watchdog utility invokes the "cinfo -r" command to determine if an agent is running and "cinfo" has a problem resolving the hostname, it can report that the agent is not running when it is. This will result in the watchdog utility restarting the agent each time it checks availability.
Code was updated to not restart the agent if the watchdog utility is able to verify the process is running in this scenario. The fix for this APAR is going to be available in the following maintenance packages: | LA interim fix | 126.96.36.199-TIV-ITM_LINUX-IF0001 | LA interim fix | 188.8.131.52-TIV-ITM_UNIX-IF0001 | fix pack | 6.2.2-TIV-ITM-FP0004
Watchdog can be disabled with the following steps: Edit the lz.ini or ux.ini and comment out the line the starts with KCA_CAP_DIR. (Note the value on the right might have a slightly different value based on the release). # KCA_CAP_DIR=$CANDLEHOME$/config/CAP:/opt/IBM/CAP and add the line KCA_CAP_DIR= Stop and restart the OS Agent.
Reported component name
ITM AGENT UNIX
Reported component ID
Last modified date
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fixed component name
ITM AGENT UNIX
Fixed component ID
Applicable component levels