IBM Support

IT15717: SERVER CRASHES IN 'SDRTRVFRAGMENT' RETRIEVING FRAGMENT FROM CONTAINER POOL WHEN FRAGMENT HAS NO EXTENTS

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • Server can crash when retrieving a fragment from a container
    pool when the fragment has no extents.
    
    
    Getcoreinfo.txt on Linux platform will look like this for a
    crash during replicate node process:
    
    #0  0x0000000000ca4948 in sdRtrvFragment (sessH=0x1720a838,
    inTxnId=<optimized out>, objectId=26248244,
    chunkType=SdChunkTypeDedup,
    objectOffset=0, objectLength=0, sinkFunc=0xc9db30
    <SdEndToEndSinkFunc>,
    contextP=0x7f40d0a1cbe8, fragSeq=4, numFragments=9, fragHdr=0,
    thisIsPOR=False, bytesTransferredP=0x7f4044f047a8) at
    sdrtrv.c:1161
    #1  0x0000000000708fad in bfRtrv (sessHandle=0xcac9fd8,
    bfId=26239633,
    bfOffset=0, bfLength=0, mountWaitMode=bfWaitMount,
    rtrvType=bfExternalRtrv, noQueryRestore=False, sinkFunc=0xda8fc0
    <SmSendData>, contextP=0x7f40d0a1cbe8, doRetry=True,
    thisPoolOnly=0,
    thisStrategyOnly=0, thisFragOnly=False, bfSize=17821889536,
    isFragmented=True, bytesTransferredP=0x7f4044f047a8) at
    bfrtrv.c:1475
    #2  0x0000000000e3e235 in smReplRtrv (handle=0x7f40a1a78958,
    bfHandle=0xcac9fd8, objId=26239633, bfSize=17821889536,
    hdrLen=426,
    metaSize=194, copyType=<optimized out>, doRetry=True,
    bytesTransferredP=0x7f4044f047a8, lastRetry=False,
    isSuperAggregate=True, isSdObject=True, isSdTarget=True,
    isCloudTarget=False) at smrepl.c:6062
    #3  0x00000000009ac621 in NrReplicateBatch (argP=0x834f1e8,
    workP=0x7f40c058b088) at nrmain.c:12147
    #4  0x0000000000fbc025 in PcConsumerThread (argP=<optimized
    out>) at
    prodcons.c:633
    #5  0x000000000103ac62 in StartThread (startInfoP=0x0) at
    pkthread.c:3779
    #6  0x00007f40ecb8f806 in start_thread () from
    /lib64/libpthread.so.0
    #7  0x00007f40e8d8864d in clone () from /lib64/libc.so.6
    #8  0x0000000000000000 in ?? ()
    
    Diagnostics:
    
    In getcoreinfo.txt output it can be seen:
    
    #0  0x0000000000ca4948 in sdRtrvFragment (sessH=0x1720a838,
    inTxnId=<optimized out>, objectId=26248244,
    chunkType=SdChunkTypeDedup,
    objectOffset=0, objectLength=0, sinkFunc=0xc9db30
    <SdEndToEndSinkFunc>,
    contextP=0x7f40d0a1cbe8, fragSeq=4, numFragments=9, fragHdr=0,
    thisIsPOR=False, bytesTransferredP=0x7f4044f047a8) at
    sdrtrv.c:1161
    
    so there is objectid=26248244 identified.
    
    Run:
    show invo 26248244 listchunks=yes
    
    From the output, it will be seen that it is a fragment of a
    super aggregate:
    
    SHOW INVO 26248244 LISTCHUNKS=YES
    Object 26248244 NOT FOUND.
    
    Bitfile Object: 26248244
    **Super-Bitfile 26248244 is a fragment in Super Aggregate
    26239633
    
    Bitfile Object NOT found.
    
    Then run show invo for super aggregate:
    show invo 26239633 listchunks=yes
    
    From the output, all Fragment IDs are listed:
    
    ***** Fragment Information *****
    Super-Bitfile 26239633 is a Super Aggregate with 9 fragments.
      Fragment ID: 26239633  Sequence Number: 0  User Bytes:
    2003814235, pendingId: -1
      Fragment ID: 26242866  Sequence Number: 1  User Bytes:
    2003813744, pendingId: -1
      Fragment ID: 26244298  Sequence Number: 2  User Bytes:
    2003863706, pendingId: -1
      Fragment ID: 26245769  Sequence Number: 3  User Bytes:
    2003813744, pendingId: -1
      Fragment ID: 26248244  Sequence Number: 4  User Bytes:
    2003763508, pendingId: -1
      Fragment ID: 26250091  Sequence Number: 5  User Bytes:
    2003813744, pendingId: -1
      Fragment ID: 26251787  Sequence Number: 6  User Bytes:
    2003863706, pendingId: -1
      Fragment ID: 26253245  Sequence Number: 7  User Bytes:
    2003813704, pendingId: -1
      Fragment ID: 26255398  Sequence Number: 8  User Bytes:
    1791329062, pendingId: -1
    
    
    Then in DB2 run:
    
    db2 "select objid,count from tsmdb1.sd_recon_order where objid
    in
    26239633, 26242866, 26244298,
    26245769, 26248244, 26250091, 26251787, 26253245, 26255398)
    group by
    objid"
    
     OBJID                2
    
    -------------------- -----------
    
                26239633       40073
    
                26242866       40072
    
                26244298       40073
    
                26245769       40072
    
                26250091       40072
    
                26251787       40073
    
                26253245       40072
    
                26255398       35823
    
    From this output it can be seen that fragment 26244298 is not
    listed as it has no extents.
    
    Thus the super aggregate has to be deleted to prevent the crash
    occurring.
    
    IBM Spectrum Protect Versions Affected:
    IBM Spectrum Protect Server: 7.1.3.x and higher on all platforms
    
    
    
    Initial Impact: Medium
    
    
    Additional Keywords: TSM IBM Spectrum Protect container pools
    crash core extents fragment 117481
    

Local fix

  • Contact IBM support for assistance in deleting the affected
    super aggregate object.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager server users.                     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in level 7.1.6. Note that this is      *
    * subject to change at the discretion of IBM.                  *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms:  AIX, Solaris, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT15717

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    71L

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-06-14

  • Closed date

    2016-06-16

  • Last modified date

    2016-06-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R71A PSY

       UP

  • R71L PSY

       UP

  • R71S PSY

       UP

  • R71W PSY

       UP



Document information

More support for: Tivoli Storage Manager

Software version: 7.1.3

Reference #: IT15717

Modified date: 16 June 2016