IBM Support

IT08763: CLIENT DEDUPLICATION SESSIONS STUCK IN RECW DUE TO NETWORK FAILURES

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Under certain network conditions/failures, it is possible that
    client side deduplication sessions can be stuck in a RecW state.
    The output from 'query session' will show these sessions have
    exceeded the commmtimeout value defined, and they will not be
    canceled as expected.  These sessions are no longer tied to any
    running client operation.
    
    Versions Affected: Tivoli Storage Manager Server 7.1 on all
    platforms
    
    Customer/L2 Diagnostics: This problem is caused by the unusual
    network failures in the environment that result in the
    deduplication chunk query session waiting on the chunk data
    session to complete, however the state of the query session
    prevents the data session from being terminated. Under typical
    network failure situations, this problem is not witnessed and
    sessions terminate as expected.
    
    On the client side this will be reported as ANS1005I TCP/IP
    failures with errno 10053, in addition to sessSendVerb errors
    with rc -50.  On the server side actlog entries will reflect
    ANR0440W protocol errors on the sessions.
    A trace of the operation from the TSM server will show the chunk
    query session detects the network failure and sets the
    termReason to 1 (SESSTERM_SEVERED) and waits for the data
    session to complete:
     [21928][smtrans.c][1898][SmRecvVerbX]:Receiving verb for
     PKS_JJ_NT1(9793) using buffer 0x7f9f30059e08.
     [21928][smtrans.c][6991][ReceiveVerb]:Failure reading verbHdr
     commRc -1 sessTerm 0.
     [21928][smtrans.c][1904][SmRecvVerbX]:ReceiveVerb rc=-1
     [21928][bfddedup.c][1284][bfDedupEndSession]:bfSessP(ChunkQry)
     0x7f9fb8030248, firstTimeCalled 1, termReason 1.
     [21928][bfddedup.c][1305][bfDedupEndSession]:dedupChunkQryState
     3.
     [21928][bfddedup.c][1320][bfDedupEndSession]:Signalling data
     session that ChunkQry done.
     [21928][bfddedup.c][1329][bfDedupEndSession]:Waiting for data
     session.
    The chunk data session is hung due to the network failure, and
    should be terminated by the SmMonitorThread.  However, because
    the termReason of the chunk query sessions was set to one, the
    SmMonitorThread cannot terminate the chunk data session:
     [82][smcancel.c][3119][MustCancel]:Should cancel session 9792
     due to refTime 31239 >= checkTime 60, stalledDedupSession 0.
     [82][smcancel.c][2127][CancelDedupSession]:ChunkQry
     0x7f9f30069dc8, DataSess 0x7f9fb8013eb8, bfSession
     0x7f9fb8030248, checkTime 60,cancelThisThruPut 0,
     bfSessionStalled 0.
     [82][smcancel.c][2215][CancelDedupSession]:Not canceling
     because isCancelSessCmd 0, cancelOtherTimeOut 0,
     cancelOtherThruPut 0.
    
    Initial Impact: Low
    
    Additional Keywords: dedup bac
    

Local fix

  • Cancel the sessions stuck in RecW, and diagnose network
    conditions which are terminating sessions
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager server users with client          *
    * deduplication.                                               *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR DESCRIPTION.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This                      *
    * problem is currently projected to be fixed                   *
    * in level 7.1.4. Note that this is subject                    *
    * to change at the discretion of IBM.                          *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms: AIX, HP-UX, Solaris, Linux and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT08763

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    71A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-05-07

  • Closed date

    2015-07-20

  • Last modified date

    2015-07-20

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R71A PSY

       UP

  • R71H PSY

       UP

  • R71L PSY

       UP

  • R71S PSY

       UP

  • R71W PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"71A","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
20 July 2015