Direct links to fixes
APAR status
Closed as program error.
Error description
v3.9- FP1 - Observed on Linux, but is platform independent. If the disco process is getting recycled (stopped and restarted) on top of the hour say 10am, and the NATTimer stitcher which runs every hour out-of-the-box is triggered at same time will cause Disco process to core while exiting. Below is the excerpt of stack trace of the core dump (frame #14 is the key) #0 CRivAtom::RALex (this=0xf5f49ef4, comp=0x72000600, retPtr=0xf5f49e74) at CRivAtom.cc:2234 #1 0xf7e83a67 in CRivBPlusTree::RBPTKeyComparison (this=0xf5dee0c8, key1=0xf5f49ef4, key2=0x72000600) at CRivBPlusTree.cc:221 #2 0xf7e83c69 in CRivBPlusTree::RBPTGetNodeForKey (this=0xf5dee0c8, searchIndex=0xf5f49ef4) at CRivBPlusTree.cc:1035 #3 0xf7e9de09 in CRivTreeList::RTLGet (this=0xf5dee0c8, key=0xf5f49ef4) at CRivTreeList.cc:315 #4 0xf7e9e074 in CRivTreeList::RTLGet (this=0xf5dee0c8, key=0x84b3ef8 "definitions") at CRivTreeList.cc:422 #5 0xf7d667b7 in CRivDatabase::RDTableNamed (this=0xf5d4c870, name=0x84b3ef8 "definitions") at CRivDatabase.cc:429 #6 0xf7d95a9e in CRivStore::RSExecSimpleQuery(CRivQuery *, ._0 *, ERivBool, ERivBool) (this=0xf5d244d0, p=0x820ff68, status=0x0, lock=E_RBFalse, getRomps=E_RBFalse) at CRivStore.cc:3304 #7 0xf7d98f4a in CRivStore::RSSimpleQueryGetRecs(CRivQuery *, ._0 *, ERivBool) (this=0xf5d244d0, p=0x820ff68, status=0x0, fromCache=E_RBFalse) at CRivStore.cc:2506 #8 0xf7d99eed in CRivStore::RSQueryGetRecs(CRivQuery *, ._0 *, ERivBool) (this=0xf5d244d0, p=0x820ff68, status=0x0, fromCache=E_RBFalse) at CRivStore.cc:2218 #9 0xf7d9ae25 in CRivStoreMgr::RSMDoQueryGetRecs(CRivQuery *, ._129 *) (this=0x81fd930, p=0x820ff68, status=0x0) at CRivStoreMgr.cc:2460 #10 0xf7d9d214 in CRivStoreMgr::RSMQueryDist(CRivDbObject *, ._129 *, ERivBool) (this=0x81fd930, dbObject=0x820ff68, status=0x0, getRecs=E_RBTrue) at CRivStoreMgr.cc:3225 #11 0xf7d9d2a1 in CRivStoreMgr::RSMDistributeQuery(CRivBag *, ._129 *, ERivBool) (this=0x81fd930, qryBag=0xf5f4a15c, status=0x0, getRecs=E_RBTrue) at CRivStoreMgr.cc:3078 #12 0xf7d9d343 in CRivStoreMgr::RSMExecuteOQL(CRivCmdCmplr &, ._129 *, ERivBool) (this=0x81fd930, compiler=..., status=0x0, getRecs=E_RBTrue) at CRivStoreMgr.cc:3050 #13 0xf7d9d43f in CRivStoreMgr::RSMProcessOQL(const char *, CRivBag **, ._129 *, CRivRecord *, ERivBool) (this=0x81fd930, qryString=<value optimized out>, qryResults=0xf5f4a230, status=0x0, record=0x0, getRecs=E_RBTrue) at CRivStoreMgr.cc:3000 #14 0xf7d9d5b2 in CRivStoreMgr::RSMProcessOQLGetRecs(const char *, CRivBag **, ._129 *, CRivRecord *) (this=0x81fd930, qryString=0x111b0a90 "select m_Type, m_Text, m_UpdTime from stitchers.definitions where ( m_Name = 'NATTimer' ); ", qryResults=0xf5f4a230, status=0x0, record=0x0) at CRivStoreMgr.cc:2966 #15 0x08089709 in CRivDisco::RSMGetStitcher (this=0x81a5bc0, stchrName=0xf5f4a280, lastUpdTime=0) at CRivDisco.cc:6062 #16 0x080826ec in CRivDisco::RDGetStitcherObject (this=0x81a5bc0, stchrName=0x11fa2458 "NATTimer") at CRivDisco.cc:2319 #17 0x0808deff in CRivDisco::RDStartStitcher (this=0x81a5bc0, stchrName=0x11fa2458 "NATTimer") at CRivDisco.cc:2404 #18 0x080819c5 in CDiscoTimer::RKTCOnKeyDeath (this=0x81fa660, stchNameStr=0x11fa2458 "NATTimer") at CDiscoTimer.cc:84 #19 0xf7d807ff in CRivKeyTimeCtrl::RKTCCheckKeys (this=0x81fa660) at CRivKeyTimeCtrl.cc:265 #20 0xf7d808e5 in CRivKeyTimeCtrl::RKTCRun (this=0x81fa660) at CRivKeyTimeCtrl.cc:220 #21 0xf7d8092d in CheckKeyTime (arg=0x81fa660) at CRivKeyTimeCtrl.cc:200 #22 0xf790d852 in start_thread () from /lib/libpthread.so.0 #23 0xf772f0ae in clone () from /lib/libc.so.6
Local fix
Disable NATTimer stitcher if the discovery has no NATed domains i.e. rename NATTimer.stch to NATTimer.stch.norun.
Problem summary
**************************************************************** * USERS AFFECTED: * * All users. * **************************************************************** * PROBLEM DESCRIPTION: * * ncp_disco core dumps while exiting if NATTIMER stitchers * * triggered (runs every hour out-of-the-box) at the same time. * **************************************************************** * RECOMMENDATION: * * None. | fix pack | 3.9.0-ITNMIP-FP0004 * * | fix pack | 3.8.0-ITNMIP-FP0007 * ****************************************************************
Problem conclusion
The CRivDisco.cc destructor deletes the disco timer object but it's after the pools, dbMgr etc. The issue is the timer kicking off while the threads are being joined etc. on exit. The fix is to put a check of the shutdown lock in the CDiscoTimer object. The following fixpacks will contain the fix: | fix pack | 3.8.0-ITNMIP-FP0007 | fix pack | 3.9.0-ITNMIP-FP0004
Temporary fix
Comments
APAR Information
APAR number
IV33079
Reported component name
NC/PREC DISCOVY
Reported component ID
5724O52DS
Reported release
390
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-12-05
Closed date
2012-12-18
Last modified date
2012-12-18
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
NC/PREC DISCOVY
Fixed component ID
5724O52DS
Applicable component levels
R380 PSY
UP
R390 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSSHRK","label":"Tivoli Network Manager IP Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.9","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
18 December 2012