IBM Support

IT07556: SERVER HANG RESULTING FROM DEADLOCK BETWEEN STATUSMONITORTHREAD HOLDING MUTEX AND DEVCLASS LATCH FOR MOUNT POINT ACQUISITION

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • Server hangs due to deadlock condition between MMSV->mutex in
    StatusMonitorThread and a devclass latch in a thread acquiring a
    mountpoint.
    .
    When the server is experiencing the hang addressed here the SHOW
    THREADS output will display the following for the
    StatusMonitorThread:
    .
       Thread  7, ID 7764 (0x1e54): StatusMonitorThread
       Parent=0, result=0, joining=0, detached=1, zombie=0,
       session=0
       Waiting for Cond 35217768 (&latchP->sFree) using mutex 0
       (na). Waiting from
       Holding mutex 84160288 (&MMSV->mutex)
       Holding mutex 84161320 (&libP->driveListMutex)
       Stack trace:
       000000007740186A NtWaitForMultipleObjects()+a
       000007FEFD3A1430 GetCurrentProcess()+40
       00000000772A06E0 WaitForMultipleObjects()+b0
       000007FEE39CB644 pkWaitCondition()+94 pkmonnt.c:1509
       000007FEE433656C AcquireLatchSpecific()+1cc latch.c:258
       000007FEE3F1E768 pvrGetLibraryDevClasses()+108 pvr.c:4688
       000007FEE3ED355E naGetClassDirs()+de napthcmd.c:2370
       000007FEE3ED33AE naGetNextPath()+d1e napthcmd.c:2271
       000007FEE3F95CAE MmsMonitorLibraryForDevClass()+41e
                        mmslib.c:1962
       000007FEE3F32725 pvrMonitorDevices()+6f5 pvrclass.c:1447
       000007FEE3E90BF8 StatusMonitorThread()+998 monstats.c:6846
       000007FEE39C72EC startThread()+13c pkthread.c:3361
       000007FEF95E3FEF beginthreadex()+107
       000007FEF95E4196 endthreadex()+192
       00000000772A59ED BaseThreadInitThunk()+d
       00000000773DC541 RtlUserThreadStart()+21
    .
    The key in the above thread output is the holding of the
    MMSV->mutex while waiting for the latch out of the DevClasses
    code.
    The deadlocking thread must be an AsMPAgent thread that would
    look similar too:
    .
    Thread 20, ID 10536 (0x2928): AsMPAgent
    Parent=0, result=0, joining=0, detached=1, zombie=0, session=0
    Holding mutex 84162696 (&libP->driveListMutex)
    Holding mutex 87064952 (&ASV->mpQueueMutex)
    .
    Stack trace:
    00000000774012FA NtWaitForSingleObject()+a
    00000000773FE518 RtlDeNormalizeProcessParams()+5d8
    00000000773FE40B RtlDeNormalizeProcessParams()+4cb
    000007FEE39CB25C pkAcquireMutex()+1c pkmonnt.c:1031
                     (where it's waiting for MMSV->mutex ( now held
                     by statusmonitor ))
    000007FEE3FBBBFF MmsAcquireDrivePath()+58f mmsdrive.c:7840
                     (where mutex is released & reacquired which
                     allows statusmonitorthread to grab)
    000007FEE3FB3E1F MmsCheckDrivesForMP()+4cf mmsdrive.c:2960
    000007FEE3F45FB1 pvrAcquireMountPoint()+2a1 pvrmp.c:1033
                     (where devclass latch acquired exclusively)
    000007FEE424327B TestSwMpReq()+12b asvolmnt.c:5204
    000007FEE424147D TestMpReq()+4d asvolmnt.c:3979
    000007FEE423FD6E AsMPAgent()+12e asvolmnt.c:2293
    .
    Tivoli Storage Manager Versions Affected:
    6.3.4 (were the status monitor was added so no MDV in error) and
    above on all platforms
    Initial Impact: High
    .
    Additional Keywords: hung not responding
    

Local fix

  • Turn off the status monitor using the TSM Admin command  SET
    STATUSMONITOR OFF. Also ensure that the OC is not connected back
    in to this server as that will turn the status monitor thread
    back on.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager server users.                     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * All Tivoli Storage Manager server users.                     *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in levels 6.3.6 and 7.1.3.             *
    * Note that this is subject to change at the discretion of     *
    * IBM.                                                         *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms:  AIX, HP-UX, Solaris, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT07556

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    71W

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-03-06

  • Closed date

    2015-04-09

  • Last modified date

    2015-04-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R63A PSY

       UP

  • R63H PSY

       UP

  • R63L PSY

       UP

  • R63S PSY

       UP

  • R63W PSY

       UP

  • R71A PSY

       UP

  • R71H PSY

       UP

  • R71L PSY

       UP

  • R71S PSY

       UP

  • R71W PSY

       UP



Document information

More support for: Tivoli Storage Manager

Software version: 7.1.3

Reference #: IT07556

Modified date: 09 April 2015