IV23876: UA:SWITCHING TEMS EVERY 10 MINS MAY RESULT IN UA CRASH

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • Severity: 2
    Approver:BEH
    Compid:  5724K1000 Tivoli Universal Agent
    Abstract:UA:Switching TEMS every 10 mins may result in UA crash
    
    Environment:
    ITM 6.2.3 GA / AIX 6.1
    
    Problem Description:
    Universal Agent (kuma620) crashes due to frequent TEMS
    switching.
    
    Detailed Recreation Procedure:
    1. Build two TEMS with hot-standby configuration
    2. Make UA connect to primary TEMS
    3. Recycle primary TEMS frequently
    4. UA sometimes crashes when switching TEMS
    
    Related Files and Output:
    With KBB_RAS1=(UNIT:kum ALL), you can see that UA crashes in
    KUM0_FormatDataField().
    

Local fix

Problem summary

  • On very, very rare occasions there has been a case where, during
    Universal Agent processing, Situation Requests or Start/Stop of
    Situations that the Universal Agent process has crashed. The
    sole condition that causes this crash event is the repetitive
    switching of Universal Agent from a primary monitoring server to
    a mirror monitoring server. The crash event has only been
    encountered by deliberately stopping primary monitoring server,
    waiting 10 minutes and then starting primary monitoring server
    again, and then repeat the stop/start sequence over and over.
    This repetitive cycling of the primary monitoring server must be
    performed numerous times before crash event occurs.
    
    
    On very, very rare occasions there has been a case where, during
    Universal Agent processing, Situation Requests or Start/Stop of
    Situations that the Universal Agent process has crashed.  Among
    the thousands of deployed Universal Agents, there has been a
    single reported case of this problem. The problem occurs due to
    a critical, internal data structure not being thread safe. This
    exposure only exists during the repetitive switching of
    Universal Agent from a primary monitoring server to a mirror
    monitoring server.
    
    In order for this APAR to be properly implemented in your
    environment, a new environment variable has been added.  See the
    "Install Actions" section of the APAR conclusion for more
    details.
    

Problem conclusion

  • Install Actions
    
    Two steps were taken to address the thread safety exposure.
    First a mutex lock was implemented on the internal data
    structure; this step is the fix to address this APAR.   The
    second step was to introduce a new Universal Agent environment
    variable named KUMA_DCHCLIENT_LOCK, used to arm or disarm the
    mutex lock.    By default this environment variable is not
    defined, thus the mutex lock, added per first step, is disabled
    - meaning the mutex lock is not being used.
    
    In order to arm the mutex lock, thus realizing effects of this
    APAR fix, user must add to um.ini ( UNIX/Linux) or KUMENV
    (Windows) the environment variable as ->
    KUMA_DCHCLIENT_LOCK=Y
    
    Because this APAR has been so rarely encountered and DCH
    client<->DCH Server thread interactions are so fundamentally
    critical to Universal Agent, it is highly recommended that user
    NOT choose to adopt this fix, albeit the mutex lock by declaring
    KUMA_DCHCLIENT_LOCK=Y, unless user does in fact become only the
    second user to experience this problem.
    
    
    The fix for this APAR is contained in the following maintenance
    packages:
    
      | fix pack | 6.2.3-TIV-ITM-FP0002
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV23876

  • Reported component name

    UNIVERSAL AGENT

  • Reported component ID

    5724K1000

  • Reported release

    623

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-06-27

  • Closed date

    2012-07-19

  • Last modified date

    2012-10-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    UNIVERSAL AGENT

  • Fixed component ID

    5724K1000

Applicable component levels

  • R623 PSY

       UP



Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Tivoli Universal Agent

Software version:

623

Reference #:

IV23876

Modified date:

2012-10-08

Translate my page

Machine Translation

Content navigation