IBM Support

IT23795: HANG CONDITION MAY OCCUR WHEN CANCELLING A LOCAL PROTECT STGPOOLOPERATION

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • There is a potential for a server hang if the QUERY PROCESS
    command is issued while a local PROTECT STGPOOL operation is in
    the process of being cancelled.  The problem can only occur if
    the PROTECT STGPOOL operation is multi-threaded; the output from
    the SHOW THREADS command will show contention between the QUERY
    PROCESS command and 2 or more PROTECT STGPOOL threads:
    
       Thread 571785, Parent 571782: SmAdminCommandThread,
       Storage 4043, AllocCnt 51 HighWaterAmt 74031
        tid=c289, ptid=13c86, det=0, zomb=0, join=1, result=0,
        sess=0, procToken=0, sessToken=346544
         Stack trace:
           0x090000000051483c _global_lock_common
           0x0900000000522108 _mutex_lock
           0x0000000100007cd8 pkAcquireMutexTracked
           0x0000000100e76ea8 SdReplQueryProcess
           0x00000001000ca5ec procQueryProcess
           0x00000001010ca8c4 AdmQueryProcess
           0x00000001003b073c AdmCommandLocal
           0x00000001003ae0f8 admCommand
           0x0000000100b88cd4 SmAdminCommandThread
           0x000000010000e114 StartThread
         Holding mutex PROCV->mutex (0x110f5a1d8),
         acquired at process.c(1152)
         Holding mutex descP->tableMutex (0x12f23ddd8),
         acquired at output.c(1923)
         Acquiring mutex ctlP->mutex (0x13ed0e158) at sdrepl.c(7334)
        Thread context:
          COMMAND: QUERY PROCESS
          COMMMETHOD: Tcp/Ip
          THREAD_TYPE: SESSION
          SESSION_TYPE: ADMIN
          ADMIN_NAME: ADMINISTRATOR
    
       Thread 571761, Parent 571753: SdReplicateBatch,
       Storage 0, AllocCnt 0 HighWaterAmt 0
        tid=18971, ptid=18669, det=0, zomb=0, join=0, result=0,
        sess=0, procToken=203, sessToken=346530
         Stack trace:
           0x090000000051483c _global_lock_common
           0x0900000000522108 _mutex_lock
           0x0000000100007cd8 pkAcquireMutexTracked
           0x0000000100e878ac SdAcquireChildCtlDesc
           0x0000000100e8d644 LocalProtectBatch
           0x0000000100e83e1c SdReplicateBatch
           0x0000000100729e40 PcConsumerThread
           0x000000010000e114 StartThread
         Holding mutex ctlP->mutex (0x13ed0e158),
         acquired at sdrepl.c(5267)
         Acquiring mutex ctlP->mutex (0x12e6f64d8) at sdrepl.c(5294)
        Thread context:
          COMMAND: PROTECT STGPOOL
          SCHEDULE_TYPE: ADMIN
          SCHEDULE_NAME: PROTECT_LOCAL_AM
          SCHEDULED: YES
    
       Thread 571764, Parent 571751: SdProtectManageChildCancel,
       Storage 0, AllocCnt 0 HighWaterAmt 0
        tid=18574, ptid=18367, det=1, zomb=0, join=0, result=0,
        sess=0, procToken=202, sessToken=346530
         Stack trace:
           0x0900000000537260 _cond_wait_global
           0x0900000000537df8 _cond_wait
           0x0900000000538ae0 pthread_cond_wait
           0x00000001000093b4 pkWaitConditionTracked
           0x00000001000c9174 procEndProcessEx
           0x0000000100e88988 SdProtectManageChildCancel
           0x000000010000e114 StartThread
         Holding mutex ctlP->mutex (0x12e6f64d8),
         acquired at sdrepl.c(5882)
         Awaiting cond procP->cancelCond (0x11902a140),
         using mutex PROCV->mutex (0x110f5a1d8), at process.c(698)
        Thread context:
          COMMAND: PROTECT STGPOOL
          SCHEDULE_TYPE: ADMIN
          SCHEDULE_NAME: PROTECT_LOCAL_AM
          SCHEDULED: YES
    
    The output above shows a deadlock condition in which the
    SmAdminCommandThread (QUERY PROCESS), SdReplicateBatch and
    SdProtectManageChildCancel threads are all holding mutexes
    required by one of the other threads while also attempting to
    acquire mutexes held by one of the other threads.  The hang
    condition will also affect any other session or process threads
    attempting to acquire a mutex held by one of these threads.  A
    restart of the Spectrum Protect server will be required to
    recover from this condition.
    
    Spectrum Protect Server Versions Affected:
    All 7.1.7.0 and higher versions of Spectrum Protect server code
    
    Initial Impact:
    Medium
    

Local fix

  • Avoid or minimize issuing the QUERY PROCESS command while a
    local PROTECT STGPOOL operation is in the process of being
    cancelled.
    
    A restart of the Spectrum Protect server instance will be
    required to recover from this condition.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All IBM Spectrum Protect server users using PROTECT STGPOOL  *
    * with TYPE=LOCAL                                              *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR DESCRIPTION.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in levels 7.1.9 and 8.1.5. Note that   *
    * this is subject to change at the discretion of IBM.          *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Platforms fixed:  AIX, Linux, Solaris, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT23795

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    81A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-01-22

  • Closed date

    2018-02-19

  • Last modified date

    2018-02-19

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R71A PSY

       UP

  • R71L PSY

       UP

  • R71S PSY

       UP

  • R71W PSY

       UP

  • R81A PSY

       UP

  • R81L PSY

       UP

  • R81W PSY

       UP

[{"Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81A"}]

Document Information

Modified date:
28 September 2021