IBM Support

IT16210: REMOVE REPLNODE MIGHT LOCK THE DATABASE WHEN THE TARGET REPLICATION SERVER HAS VERY LARGE NUMBER OF IN-FLIGHT OBJECTS

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • The REMOVE REPLNODE command might result in unexpected high
    locking activity on the source server if the target server has a
    very large number of in-flight objects, typically caused by some
    incomplete replication activity.
    
    Under the APAR condition, using the server monitoring script
    from technote swg21432937, you will see the remove replnode
    command constantly processing in
    
    0x0000000100579470  RemoveOneInFlightMember(??, ??, ??, ??, ??,
    ??) + 0x170
    0x000000010056d8b4  DelReplGroupMemberships(??, ??) + 0x3f4
    0x000000010056d3ec  imDeleteReplData(??, ??, ??, ??) + 0x10c
    0x00000001005881e4  AdmRemoveReplNode(??) + 0xe64
    0x000000010070cbe8  AdmCommandLocal(??, ??, ??, ??, ??) + 0x6e8
    0x000000010070a6fc  admCommand(??, ??, ??, ??, ??) + 0xf7c
    0x0000000100bdb9b0  SmAdminCommandThread(??) + 0x30
    
    The REMOVE REPLNODE finally fails with
    
    ANR9999D Thread<14441>  0x0000000100587be8  AdmRemoveReplNode
    ANR1632E REMOVE REPLNODE: Command failed.     Replication
     state information for the specified nodes could not be
     removed.
    ANR0171I tbrsql.c(1446): Error detected on 113:  24,
     database in evaluation mode.
    ANR0103E bfcreate.c(3375): Error 4522 updating
     row in table "BF.Aggregated.Bitfiles".
    ANR9999D_0524624341 ImDeleteBitfile(imutil.c: 10241)
     Thread<14441>: Unexpected rc=4522 from bfDestroy for
     objId 6028692774
    
    another message sequence looks like
    
    ANR0171I bfaggrut.c(1644): Error detected on 179:20,
     database in evaluation mode. (SESSION: 723495)
    ANR0157W Database operation FETCH for table
     BF.Aggregate.Attributes failed with result code 4522 and
     tracking ID: 12407bce8. (SESSION: 723495)
    ANR0158W Database operation FETCH for table
     BF.Aggregate.Attributes failed with operation code 4522
     and tracking id 12407bce8. The data for column 0 is:
     (int32)0. (SESSION: 723495)
    ANR0158W Database operation FETCH for table
     BF.Aggregate.Attributes failed with operation code 4522
     and tracking id 12407bce8. The data for column 1 is:
     (int64)7184267819. (SESSION: 723495)
    ANR0106E bfaggrut.c(1647): Unexpected error 4522 fetching
     row in table "BF.Aggregate.Attributes". (SESSION: 723495)
    ANR9999D_2112824638 bfGetSubBitfileInfoEx(bfaggrut.c:1649)
     Thread<1501198>: Unable to locate attributes for bitfile
     7184267819. (SESSION: 723495)
    ANR9999D_0524624341 ImDeleteBitfile(imutil.c:10241)
     Thread<1501198>: Unexpected rc=9979 from bfDestroy for
     objId 6443411018 (SESSION: 723495)
    ANR1632E REMOVE REPLNODE: Command failed. Replication
     state information for the specified nodes could not be
     removed. (SESSION: 723495)
    ANR9999D_0413043177 GetCommandRc(admrepl.c:7134)
     Thread<1501198>: unexpected rc=9996 (SESSION: 723495)
    
    To verify that you are exposed to the APAR, as instance user,
    from a DB2 command window, submit the following commands after
    replacing MYNODENAME with the upper case nodename that you want
    to process:
    
    db2 connect to tsmdb1
    db2 set schema tsmdb1
    db2 "select nrig.tgt_groupid, nrig.mem_count from
    tsmdb1.inflight_replgroups2 nrig where nrig.tgt_nodeid in
    (select n.nodeid from tsmdb1.nodes n where
    n.nodename='MYNODENAME') and nrig.mem_count!=(select count(*)
    from tsmdb1.group_leaders imgl where
    imgl.leaderid=nrig.tgt_groupid)" > inflight.txt
    
    If you sum up nrig.mem_count you will  get the number of
    inflight objects, in the case of the APAR this were more than 73
    million objects::
    
     awk '{sum +=$2} END {print sum}' inflight.txt
    73696203
    
    
      Tivoli Storage Manager Versions Affected: all supported V6 V7
      Initial Impact: Medium
      Additional Keywords: Commit TSM IBM Spectrum Protect
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager server users.                     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR DESCRIPTION.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in level 7.1.7. Note that this is      *
    * subject to change at the discretion of IBM.                  *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms: AIX, HP-UX, Solaris, Linux and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT16210

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    71A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-07-20

  • Closed date

    2016-08-01

  • Last modified date

    2016-08-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R71A PSY

       UP

  • R71H PSY

       UP

  • R71L PSY

       UP

  • R71S PSY

       UP

  • R71W PSY

       UP



Document information

More support for: Tivoli Storage Manager

Software version: 7.1.3

Reference #: IT16210

Modified date: 01 August 2016