APAR status
Closed as program error.
Error description
When running a "REPLICATE NODE" process and a Tivoli Storage Manager Client operation (ie. backup, restore) simultaneously, it can cause the deadlock of the client session with the following error logged to the activity log: 11/08/12 21:11:09 ANR9999D_1607799673 xiBuildNodeDef(xirepl.c:1248)Thread<10231>: Error 1020 from admGetNodeConvState nodeName=NODE1 (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> issued message 9999 from: (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x000000010000d0b8 StdPutText (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x000000010000db34 OutDiagToCons (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x0000000100008d9c outDiagfExt (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x0000000100bdf454 xiBuildNodeDef (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x0000000100663eb4 smReplQueryNode SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x000000010063089c DoNodeHandshake (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x000000010062f2e4 NrReplicationThread (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR9999D Thread<10231> 0x0000000100020ae0 StartThread (SESSION: 4890, PROCESS: 112) 11/08/12 21:11:09 ANR0551E The client operation failed for session XXXXXX for node NODE1 (AIX) - lock conflict. (SESSION: 4890) The problem is because there are two different transactions using the same thread for accessing node information. Customer/L2 Diagnostics: Using the serverperf.pl script and automatizing the collection of "show locks" and "show thread" we can see the following: LockDesc: Type=17001(admin node name), NameSpace=0, SummMode=sLock, Key='BCC47' Holder: (admutil.c:4667 Thread 898045) Tsn=0:247788814, Mode=sLock Waiter: (admutil.c:4667 Thread 898048) Tsn=0:247788827, Mode=sixLock Waiter: (admutil.c:4667 Thread 898045) Tsn=0:247788896, Mode=sLock Thread 898045, Parent 898044: NrReplicationThread, Storage 374846, AllocCnt 279 HighWaterAmt 407315 tid=e0fd, ptid=dffc, det=0, zomb=0, join=1, result=0, sess=579875 Awaiting cond waitP->waiting (0x16cd96030), using mutex TMV->mutex (0x110c80bb8), at tmlock.c(753) Stack trace: 0x09000000004f3ae0 _cond_wait_global 0x09000000004f4678 _cond_wait 0x09000000004f5360 pthread_cond_wait 0x00000001000076f4 pkWaitConditionTracked 0x00000001000bda1c tmLockTracked 0x00000001000d2e1c AdmLockNode 0x000000010026eea4 admGetNodeIdForNodeNameExt 0x000000010025b59c admCheckProxyNode 0x0000000100268234 admGetNodeExtAttrs 0x0000000100bdeb80 xiBuildNodeDef 0x0000000100663eb4 smReplQueryNode 0x000000010063089c DoNodeHandshake 0x000000010062f2e4 NrReplicationThread 0x0000000100020ae0 StartThread Note that Thread 898045 is using 2 transactions, and Thread 898048 jumps in between them. Tivoli Storage Manager Versions Affected: Tivoli Storage Manager Server v.6.3 and above Initial Impact: Medium Additional Keywords: TSM DEADLOCK THREAD NODE REPLICATION CLIENT BACKUP TXN
Local fix
Don't run replicate node concurrently with a client operation
Problem summary
**************************************************************** * USERS AFFECTED: All Tivoli Storage Serer Manager users of * * the REPLICATE NODE command. * **************************************************************** * PROBLEM DESCRIPTION: See ERROR DESCRIPTION. * **************************************************************** * RECOMMENDATION: Apply fixing level when available. This * * problem is currently projected to be fixed * * in level 6.3.4. Note that this is subject * * to change at the discretion of IBM. * **************************************************************** *
Problem conclusion
This problem was fixed. Affected platforms: AIX, HP-UX, Solaris, Linux, and Windows.
Temporary fix
Comments
APAR Information
APAR number
IC89590
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
63A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-01-17
Closed date
2013-03-05
Last modified date
2013-03-05
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
R63A PSY
UP
R63H PSY
UP
R63L PSY
UP
R63S PSY
UP
R63W PSY
UP
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"63A","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
05 March 2013