IV23876: UA:SWITCHING TEMS EVERY 10 MINS MAY RESULT IN UA CRASH
Fixes are available
Closed as program error.
Severity: 2 Approver:BEH Compid: 5724K1000 Tivoli Universal Agent Abstract:UA:Switching TEMS every 10 mins may result in UA crash Environment: ITM 6.2.3 GA / AIX 6.1 Problem Description: Universal Agent (kuma620) crashes due to frequent TEMS switching. Detailed Recreation Procedure: 1. Build two TEMS with hot-standby configuration 2. Make UA connect to primary TEMS 3. Recycle primary TEMS frequently 4. UA sometimes crashes when switching TEMS Related Files and Output: With KBB_RAS1=(UNIT:kum ALL), you can see that UA crashes in KUM0_FormatDataField().
On very, very rare occasions there has been a case where, during Universal Agent processing, Situation Requests or Start/Stop of Situations that the Universal Agent process has crashed. The sole condition that causes this crash event is the repetitive switching of Universal Agent from a primary monitoring server to a mirror monitoring server. The crash event has only been encountered by deliberately stopping primary monitoring server, waiting 10 minutes and then starting primary monitoring server again, and then repeat the stop/start sequence over and over. This repetitive cycling of the primary monitoring server must be performed numerous times before crash event occurs. On very, very rare occasions there has been a case where, during Universal Agent processing, Situation Requests or Start/Stop of Situations that the Universal Agent process has crashed. Among the thousands of deployed Universal Agents, there has been a single reported case of this problem. The problem occurs due to a critical, internal data structure not being thread safe. This exposure only exists during the repetitive switching of Universal Agent from a primary monitoring server to a mirror monitoring server. In order for this APAR to be properly implemented in your environment, a new environment variable has been added. See the "Install Actions" section of the APAR conclusion for more details.
Install Actions Two steps were taken to address the thread safety exposure. First a mutex lock was implemented on the internal data structure; this step is the fix to address this APAR. The second step was to introduce a new Universal Agent environment variable named KUMA_DCHCLIENT_LOCK, used to arm or disarm the mutex lock. By default this environment variable is not defined, thus the mutex lock, added per first step, is disabled - meaning the mutex lock is not being used. In order to arm the mutex lock, thus realizing effects of this APAR fix, user must add to um.ini ( UNIX/Linux) or KUMENV (Windows) the environment variable as -> KUMA_DCHCLIENT_LOCK=Y Because this APAR has been so rarely encountered and DCH client<->DCH Server thread interactions are so fundamentally critical to Universal Agent, it is highly recommended that user NOT choose to adopt this fix, albeit the mutex lock by declaring KUMA_DCHCLIENT_LOCK=Y, unless user does in fact become only the second user to experience this problem. The fix for this APAR is contained in the following maintenance packages: | fix pack | 6.2.3-TIV-ITM-FP0002
Reported component name
Reported component ID
Last modified date
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fixed component name
Fixed component ID
Applicable component levels