IBM Support

IJ00337: ON AIX, THE UNIX OS AGENT RANDOMLY HANGS ON START-UP.

A readme is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Environment:      6.30 FP7 UNIX OS Agent on AIX-only
    Problem Description: On AIX, the Monitoring Agent for UNIX OS
    sometimes
    hangs at startup when running 6.30 FP7.  The version of GSKit
    in 6.30 F7
    includes some updates that cause there to be a deadlock between
    a GSKit
    thread and a Watchdog thread.   This results in the agent
    hanging and
    not returning results or connecting to TEMS.   This is a timing
    issue
    and does not always occur.
    Related Files and Output:
    With tracing set to (UNIT:kgl ALL), you will see the message
      (59CA9623.64B9-1:kglcry.c,2985,"initializeICC") Calling
    ICC_Init for
    GSKit 8.
    in the agent RAS1 log file, but will not see the message:
      (59CA9623.79CC-1:kglcry.c,2989,"initializeICC") ICC_Init
    completed.
    Context returned 110E5B4F0
    If there is a corefile or a process stack trace (e.g. taken with
    "procstack -p <pid of kuxagent>), they will show two threads
    with
    "setlocale" in the trace.   Once will be from GSKit which
    originates
    from ICC_Init.  The other originates from Watchdog in
    kca_mbstowcs
    (dbx) where all
    Thread $t1
    warning: Thread is in kernel mode, not all registers can be
    accessed.
    .() at 0x0
    _rec_mutex_lock(??) at 0x90000000002ae08
    setlocale(??, ??) at 0x9000000000170b0           <== setlocale
    expand_catname(??, ??, ??) at 0x900000000043b04
    catopen(??, ??) at 0x900000000044c7c
    __strerror(??, ??, ??) at 0x900000000059b34
    strerror(??) at 0x90000000005a0e4
    build_SYS_str_reasons() at 0x900000003cd0110
    ERR_load_ERR_strings() at 0x900000003cd1f80
    ERR_load_crypto_strings() at 0x900000003e044cc
    iccLoadErr(??) at 0x900000003cbc0f4
    OpenSSL_Init(??, ??) at 0x900000003cbc2b4
    ICCLoad() at 0x900000003cbc9b4
    iccSLInit() at 0x900000003e1b90c
    mod_init1(??, ??) at 0x9fffffff0002a50
    usl_init_mods(??, ??) at 0x9fffffff0003c30
    uload(??, ??, ??, ??, ??, ??) at 0x9fffffff00023e0
    load1(??, ??, ??, ??) at 0x9000000000006f4
    load(??, ??, ??) at 0x900000000001770
    loadAndInit(??, ??, ??) at 0x9000000000eadac
    dlopen(??, ??) at 0x900000000090f88
    ICC_LoadLibrary(??) at 0x900000003623714
    ICCN_Init(??, ??) at 0x900000003623c30
    ICC_Init(??, ??) at 0x900000003618bb8
    initializeICC(0xfffffffffff8890) at 0x900000002f383dc
    CRY_RAND() at 0x900000002f3ba28
    AccessAuthorizationGroupProfile::addUserToAADB(char*,int,char*)(
    0x110bd8
    e50, 0xfffffffffffd2a8, 0x700000007, 0xfffffffffffd2b1) at
    0x9000000033cc284
    kgeaagpx.AAGPUserEnd(void*,const char*)(0xfffffffffffa318,
    0x110bdaa30)
    at 0x9000000033d69f0
    endKGEelement(void*,const char*)(0xfffffffffffa318,
    0x110bdaa30) at
    0x9000000033d5c44
    doContent(0x110bd9670, 0x0, 0x9001000a075f570,
    0x9000000033f3e6a,
    0x9000000033f3e7a, 0x0) at 0x900000002ee3950
    contentProcessor(0x110bd9670, 0x9000000033f3af4,
    0x9000000033f3e7a,
    0x0) at 0x900000002ee9bd8
    doProlog(0x110bd9670, 0x9001000a075f570, 0x9000000033f3af4,
    0x9000000033f3e7a, 0x1d0000001d, 0x9000000033f3af4, 0x0) at
    0x900000002eea4bc
    prologProcessor(0x110bd9670, 0x9000000033f3af4,
    0x9000000033f3e7a, 0x0)
    at 0x900000002eec02c
    prologInitProcessor(0x110bd9670, 0x9000000033f3af4,
    0x9000000033f3e7a,
    0x0) at 0x900000002eec180
    XML1_Parse(0x110bd9670, 0x9000000033f3af4, 0x38600000386,
    0x100000001)
    at 0x900000002ee6d98
    KGE_AccessAuthorizationGroupPolicyProcessor(KGE_XMLreq_t*)(0xfff
    ffffffff
    a318) at 0x9000000033d5124
    AccessAuthorizationGroupProfile::AccessAuthorizationGroupProfile
    (char*)(
    0x110bd8e50, 0x110bd8a0f) at 0x9000000033d172c
    KGE_InitAccessAuthorizationGroupProfile(void*)(0x110bd8a0f) at
    0x9000000033d1ae4
    BSS1_InitializeOnce(0x9001000a088b72c, 0x9001000a08925b0,
    0x110bd8a0f,
    0x9000000033f43ac, 0x9f0000009f) at 0x900000002ead320
    KGE_GetAAGP(char*)(0x110bd8a0f) at 0x9000000033d0d3c
    kgeaagpa.KGE_AAGP_CheckAuthorization(0x110bd86f0) at
    0x9000000033ef000
    kramain(0x100000001, 0xfffffffffffe360) at 0x900000002983ba8
    main(argc = 1, argv = 0x0fffffffffffe360, env =
    0x0fffffffffffe370),
    line 1149 in "kuxmain.cpp"
    Thread $t18
    warning: Thread is in kernel mode, not all registers can be
    accessed.
    .() at 0x0
    _rec_mutex_lock(??) at 0x90000000002ae08
    __modinit_lock(??) at 0x90000000003d6b0
    load1(??, ??, ??, ??) at 0x900000000000644
    load(??, ??, ??) at 0x900000000001770
    __lc_load@AF5_1(??, ??, ??) at 0x900000000201058
    load_locale(??, ??, ??) at 0x9000000000136fc
    setlocale(??, ??) at 0x9000000000166cc             <== setlocale
    kca_mbstowcs(char*)(__classReturn = &(...), str = "amqzmur0"),
    line 49
    in "kcautil.cpp"
    unnamed block in
    KcaCmdAIX::getRunningProcesses(std::vector<KcaProcess*,std::allo
    cator<Kc
    aProcess*> >&)(this = 0x0000000110bb13f0, procList = &(...)),
    line 363
    in "kcacmdaix.cpp"
    KcaCmdAIX::getRunningProcesses(std::vector<KcaProcess*,std::allo
    cator<Kc
    aProcess*> >&)(this = 0x0000000110bb13f0, procList = &(...)),
    line 363
    in "kcacmdaix.cpp"
    Controller::initialDiscovery()(this = 0x0000000110f9baf0), line
    348 in
    "kcactrl.cpp"
    Controller::PASThreadExecution()(this = 0x0000000110f9baf0),
    line 4822
    in "kcactrl.cpp"
    PASThreadEntry(param = (nil)), line 4944 in "kcactrl.cpp"
    

Local fix

  • The temporary workaround is to disable Watchdog.
    1.  Stop the OS Agent (if hung will need to stop with force).
    2.  Edit the $CANDLEHOME/config/ux.ini file
        Comment out the line:
          # KCA_CAP_DIR=$CANDLEHOME$/config/CAP:/opt/IBM/CAP
        Add the line:
          KCA_CAP_DIR=
    3.  Restart the OS Agent.
    This will avoid the conflict that causes the agent to hang.
    

Problem summary

  • On AIX, The 6.30 FP7 OS agent sometimes hangs on startup.
    
    On AIX, the 6.30 FP7 Monitoring Agent for UNIX OS sometimes
    hangs at startup.  The version of GSKit provided with 6.30 F7
    includes some updates that cause there to be a deadlock between
    a GSKit thread and a Watchdog thread.   This results in the
    agent hanging and not returning results or connecting to TEMS.
    This is a timing issue and does not always occur.
    

Problem conclusion

  • The version of GSKit provided in 6.30 FP7 SP1 was uplifted and
    the deadlock issue was resolved.
    
    
    The fix for this APAR is contained in the following maintenance
    packages:
    
    
       | service pack | 6.3.0.7-TIV-ITM-SP0001
       | provisional fix |  6.3.x-TIV-ITM-GSK-8.0.50.84-IJ00337
         http://www.ibm.com/support/docview.wss?uid=swg24044365
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ00337

  • Reported component name

    ITM AGENT UNIX

  • Reported component ID

    5724C040U

  • Reported release

    630

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-09-26

  • Closed date

    2018-01-17

  • Last modified date

    2019-05-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TEMA

  • Fixed component ID

    5724C04TE

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
08 March 2023