IBM Support

IC81115: TIVOLI STORAGE MANAGER SERVER PERFORMING REORG WITH DEDUPLICATION ENABLED CAN CAUSE NODE TO BECOME "LOCKED"

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The Tivoli Storage Manager server might experience resource
    timeouts that result in sessions/processes failing consistently.
    
    
    This can only occur if the server has been configured for
    deduplication on at least one storage pool. That storage pool
    has to be actively managing data that has been deduplicated.
    Depending on what resource is being pinned, the symptom could be
    that at least one session/process cannot complete its operation
    successfully. This will most likely be repeatable until the
    reorganization is stopped/killed. The following messages will be
    seen in the activity log repeatedly:
    
    01/22/12   06:02:34      ANR0538I A resource waiter has been
    aborted.
    
    The operation that has experienced the waiter abort will have
    its own failure message shortly thereafter.
    
    The key to diagnosing this issue is by locating one of the
    following tables being reorganized at the time of the excessive
    lock failures and perceived hangs.
    BF_QUEUED_CHUNKS
    BF_DEREFERENCED_CHUNKS
    
    This can be found by issuing the following DB2 diagnostic
    command:
    db2pd -db tsmdb1 -reorg
    
    This might occur while the Tivoli Storage Manger Server is
    reorganizing the BF_QUEUED_CHUNKS or the BF_DEREFERENCED_CHUNKS
    tables.  These tables due to their volatile nature should not be
    reorganized.
    
    The following commands can be used to determine if the reported
    condition has occurred:
    
    1) db2 connect to tsmdb1
    2) db2 set schema tsmdb1
    3) db2pd -d tsmdb1 -reorg
    
    If the output indicates that Status column of the "Table Reorg
    Stats:" stanza for BF_QUEUED_CHUNKS or BF_DEREFERENCED_CHUNKS
    tables is not "Done" or "Stopped" or "Paused", and the CurCount
    value is not incremented for subsequent "db2pd -d tsmdb1 -reorg"
    commands, or if there is no value for the CurCount column, you
    are probably experiencing this condition.
    
    Example output:
    Table Reorg Stats:
    Address TableName Start End PhaseStart MaxPhase Phase CurCount
    MaxCount Status Completion
    0x000007F726E94E28 BF_QUEUED_CHUNKS 01/20/2012 16:40:35 n/a n/a
    n/a n/a 0 114527 Started 0
    
    Additional Keywords:
    
    Hang, Abort, Dedupe, ACO5436E, ANS1301E;
    

Local fix

  • Stop the online REORG and then perform the backup. To stop the
    REORG:
    
    
    1) db2 connect to tsmdb1
    2) db2 set schema tsmdb1
    3) db2 "reorg table <tablename> inplace stop"
    4) Wait 5 minutes
    5) db2pd -d tsmdb1 -reorg
    The Status column of the  "Table Reorg Stats:" stanza should be
    "Stopped".
    
    If this does not stop the reorg, or if you get DB2 error
    message:
    
    SQL2219N The specified INPLACE table reorganization action on
    table
    "TSMDB1.<tablename>" is not allowed on one or more nodes. Reason
    code:
    "10".
    
    you are probably experiencing DB2 APAR IC79773.   To stop the
    reorg, do the following:
    
    
    1. Determine the application ID of the reorganization process,
    by issuing the following commands in a DB2 Command Line
    Processor window:
    A. db2 connect to tsmdb1
    B. db2 get snapshot for all applications >application.out
    2. Examine the application.out file and find the "Most recent
    operation" entry like this:
    Most recent operation = Reorganize
    
    If that line isn't there, look for an entry like this:
    Application name = db2reorg
    3. Scroll backwards until finding the "Application handle"
    entry. It will look like something like this:
    Application handle = NNNNN (where NNNNN is the actual
    application handle)
    Ensure that the correct application handle is found.
    4. Issue the following command in the DB2 Command Line Processor
    Window substituting in the actual application handle in for
    NNNNN:
    db2 "force application (NNNNN)"
    5. Because the nature of the command being canceled and that the
    DB2 FORCE APPLICATION command is asynchronous, it might take up
    to 30 minutes for the process to be canceled.
    6. To verify that it has been canceled, issue steps 1b and 2
    again. If there is no "Most recent operation" of type Reorganize
    message displayed, it has been canceled.
    
    
    
    Please see the following technote for additional information on
    cancelling the reorganization.
    
          http://www-01.ibm.com/support/docview.wss?uid=swg21452146
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All Tivoli Storage Manager server users.     *
    *                                                              *
    *                                                              *
    ****************************************************************
    * PROBLEM DESCRIPTION: See error description.                  *
    *                                                              *
    *                                                              *
    *                                                              *
    ****************************************************************
    * RECOMMENDATION: Apply fixing level when available. This      *
    *                                                     Problem  *
    *                 is currently projected to be fixed           *
    *                    in levels 6.2.4, and 6.3.1.               *
    *                    Note that this                            *
    *                    is subject to change at the               *
    *                    discretion of IBM.                        *
    *                                                              *
    *                                                              *
    ****************************************************************
    *
    

Problem conclusion

Temporary fix

Comments

APAR Information

  • APAR number

    IC81115

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    61L

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-02-01

  • Closed date

    2012-03-30

  • Last modified date

    2013-08-23

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R61A PSY

       UP

  • R61H PSY

       UP

  • R61L PSY

       UP

  • R61S PSY

       UP

  • R61W PSY

       UP

  • R62A PSY

       UP

  • R62H PSY

       UP

  • R62L PSY

       UP

  • R62S PSY

       UP

  • R62W PSY

       UP

  • R63A PSY

       UP

  • R63H PSY

       UP

  • R63L PSY

       UP

  • R63S PSY

       UP

  • R63W PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"61L","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
23 August 2013