IBM Support

IT01366: NODE REPLICATION DEADLOCKS WITH EXPIRE INVENTORY THEN EXPIRATION HANGS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Expire Inventory processed all nodes successfully but looks
    hung.  The Expire Inventory process will be canceled by a
    deadlock after many hours.   During the same time that Expire
    Inventory looks hung, many deadlock messages are reported for
    active Node Replication processes.
    
    Tivoli Storage Manager Versions Affected: All 6.2, 6.3 and 7.1
    Servers
    
    Customer/L2 Diagnostics:
    Example of Actlog messages:
    05:05:22 ANR0984I Process 15 for EXPIRE INVENTORY started in the
    BACKGROUND at 05:05:22. (SESSION: 22095, PROCESS: 15)
    
    06:00:24 ANR0984I Process 16 for Replicate Node started in the
    BACKGROUND at 06:00:24. (SESSION: 25296, PROCESS: 16)
    06:51:33 ANR0166I Inventory file expiration finished processing
    for node x, filespace y, copygroup z and object type GROUP BASE
    with processing statistics: examined 78012, deleted 78012,
    retrying 0, and failed 0. (SESSION: 22095, PROCESS: 15)
    
    07:03:48 ANR0159E nrmain.c(6185): Database deadlock detected on
    123:2. (SESSION: 25296, PROCESS: 16)
    07:03:48 ANR0162W Supplemental database diagnostic information:
    -1:40001:-911 ([IBM][CLI Driver][DB2/AIX64] SQL0911N  The
    current transaction has been rolled back because of a deadlock
    or timeout.  Reason code '2'.  SQLSTATE=40001). (SESSION: 25296,
    PROCESS: 16)
    
    08:11:30 ANR0408I Session 30796 started for server x (AIX)
    (Tcp/Ip)
    for replication.  (SESSION: 29886, PROCESS: 20)
    
    08:12:14 ANR0159E nrmain.c(6185): Database deadlock detected on
    232:2. (SESSION: 29886, PROCESS: 20)
    08:12:14 ANR0162W Supplemental database diagnostic information:
    -1:40001:-911 ([IBM][CLI Driver][DB2/AIX64] SQL0911N  The
    current transaction has been rolled back because of a deadlock
    or timeout.   Reason code '2'.  SQLSTATE=40001). (SESSION:
    29886, PROCESS: 20)
    
    14:55:43 ANR0159E tbrsql.c(1485): Database deadlock detected on
    64:33. (SESSION: 22095, PROCESS: 15)
    14:55:43 ANR0162W Supplemental database diagnostic information:
    -1:40001:-911 ([IBM][CLI Driver][DB2/AIX64] SQL0911N  The
    current transaction has been rolled back because of a deadlock
    or timeout.  Reason code '2'. SQLSTATE=40001). (SESSION: 22095,
    PROCESS: 15)
    
    Show Threads will has expiration working in: imReplHourlyMonitor
    Example:
    Thread 32525, Parent 32520: ExpirationProcessThread, Storage
    1243996,
    AllocCnt 231006 HighWaterAmt
      1602363
      tid=780d, ptid=6f08, det=1, zomb=0, join=0, result=0, sess=0
       Stack trace:
         0x0900000000260e10 semop
         0x0900000000b9f58c sqloSSemP
         0x0900000000b9ef64 .sqlccrecv.fdpr.clone.739
         0xffffffff89000017 *UNKNOWN*
         0x0900000000b9e82c sqljcReceive__FP10sqljCmnMgr
         0x0900000000bac440
    sqljrDrdaArExecute__FP14db2UCinterfaceP9UCstpInfo
         0x0900000000e43520
    CLI_sqlExecute__FP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO
         0x0900000000eb271c
    SQLExecute2__FP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO
         0x0900000000ec4c64 SQLExecute
         0x00000001001a2490 tbRegExecEx
         0x00000001002518d8 NrPrune
         0x00000001006c8650 imReplHourlyMonitor      <-----
         0x0000000100a8348c ExpirationProcessThread
         0x000000010000c264 StartThread
    
    Initial Impact: Medium
    
    Additional Keywords:  repl exp inv
    

Local fix

  • Until fix is available, see work around documented at:
    http://www.ibm.com/support/docview.wss?uid=swg21661695
    Once the fix is available and there are nodes that will no
    longer be replicated any more, the  REPLRETENTION option can
    then be set back to the default of 30 or to the number of days
    that match how often the latest scheduled node replication is
    done.  After the number of days indicated by the REPLRETENTION
    value has passed, nodes that are not being replicated will have
    their information cleared from the Replication History.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager Users                             *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This                      *
    * problem is currently projected to be fixed                   *
    * in levels 6.3.5 and 7.1.1. Note that this is                 *
    * subject to change at the discretion of IBM.                  *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms:  AIX, HP-UX, Solaris, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT01366

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    63A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-04-28

  • Closed date

    2014-05-16

  • Last modified date

    2014-05-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R63A PSY

       UP

  • R63H PSY

       UP

  • R63L PSY

       UP

  • R63S PSY

       UP

  • R63W PSY

       UP

  • R71A PSY

       UP

  • R71H PSY

       UP

  • R71L PSY

       UP

  • R71S PSY

       UP

  • R71W PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"63A","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
16 May 2014