IBM Support

IT13678: SERVER CRASH AFTER ANR9999D_1700456966 MESSAGES CAUSED BY A HANDLE LEAK

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as program error.

Error description

  • An intermittent IBM Spectrum Protect server crash may occur once
    the handle limit is reached on the operating system. The
    problem happens in normal server operations, anything can
    trigger the crash and the problem builds up over time.
    Eventually the handle leak will cause the server to crash.
    
    In the actlog the following error messages can be seen prior to
    the crash:
    ANR9999D_1700456966 pkBeginNamedThread(pkthread.c:1172)
    Thread<186>:
    Unable to create new Thread.
    
    The handle limit in Windows is about 16 million handles, if the
    IBM Spectrum Protect server reaches 8 million
    condition variables as shown by the SHOW CONDVAR command this
    means the server is about to crash.
    
    The crash is more common on Windows platforms but it may occur
    across all supported platforms.
    
    Although not ubiquitous, the call stack from the server crash
    dump may show similar to the following lines:
    KERNELBASE!RaiseException+0x39
    adsmdll!pkLogicAbort+0x5d
    adsmdll!HashDelete+0x62
    adsmdll!DestroyLock+0x5c
    adsmdll!TmReleaseTxnLocks+0xa8
    adsmdll!tmEndX+0x2e6
    adsmdll!tmEndWithStreamMsg+0x3e
    adsmdll!BfDedupVolumeThread+0x178d
    adsmdll!startThread+0x171
    MSVCR120!_callthreadstartex+0x17
    MSVCR120!_threadstartex+0x102
    kernel32!BaseThreadInitThunk+0xd
    ntdll!RtlUserThreadStart+0x1d
    
    When analyzing the Windows dump file, the following lines can be
    reviewed:
    0:162> dt pCI
    adsmdll!pCI
    0x00000000`00583a10
       +0x000 nConditionVariablesInUse : 0n8339526
       +0x004 nTotalConditionVariables : 0n8347648
       +0x008 pConditionList   : 0x00000000`0058b170
    CONDITION_LIST_SEGMENT_TAG
       +0x010 pHeadItem        : 0x00000000`02231000
    CONDITION_ITEM_TAG
       +0x018 pTailItem        : 0x00000002`947c0d98
    CONDITION_ITEM_TAG
       +0x020 pFreeListHeadItem : 0x00000002`945c7d68
    CONDITION_ITEM_TAG
       +0x028 pFreeListTailItem : 0x00000002`945c7b00
    CONDITION_ITEM_TAG
       +0x030 csConditionInfo  : _RTL_CRITICAL_SECTION
    
    The first line tells us how many condition variables are in use
    by the server (in the example it is over 8 million). Each of the
    condition variables result in 2 handles being used by the
    process, for this reason we receive the message that we are
    unable to create a thread.
    
    On all platforms, the symptoms also appear as a memory leak.
    The dsmserv process use a lot of memory over time.
    A SHOW ALLOC will display a lot of memory used in these
    areas, and continuously growing:
    
     tmlock.c line 2457: 114960858 entries for 13795302960 bytes
      (GetLockHolder)
     pkmon.c line  1057:    28743  entries for 12876864000 bytes
      (pkCreateNamedConditionTracked)
    
    Tivoli Storage Manager Versions Affected: All 7.1.3 and newer
    server versions across all supported platforms.
    
    Customer/L2 Diagnostics:
    
     Here is an example on AIX where the server may be killed due to
     low paging space. The output from errpt -a will display:
     ------
    LABEL:          PGSP_KILL
    IDENTIFIER:     C5C09FFA
     ...
    
    Description
    SOFTWARE PROGRAM ABNORMALLY TERMINATED
    
    Probable Causes
    SYSTEM RUNNING OUT OF PAGING SPACE
    
    Failure Causes
    INSUFFICIENT PAGING SPACE DEFINED FOR THE SYSTEM
    PROGRAM USING EXCESSIVE AMOUNT OF PAGING SPACE
    
     Recommended Actions
     DEFINE ADDITIONAL PAGING SPACE
     REDUCE PAGING SPACE REQUIREMENTS OF PROGRAM(S)
    
    Detail Data
    PROGRAM
    dsmserv
    USER'S PROCESS ID:
                   7995902
    PROGRAM'S PAGING SPACE USE IN 1KB BLOCKS
        15934412
    
    Initial Impact:
    Moderate
    
    Additional Keywords:
    Spectrum Protect, Server, condition variable, handles, handle
    leak, ANR9999D_1700456966, create thread, new thread, leak,
    handle limit KERNELBASE!RaiseException+0x39,
    adsmdll!DestroyLock+0x5c, adsmdll, crash, dump, dsmcrash,
    dsmsvc.exe, msvcr120.dll, appcrash, dsmcrash.dmp, WI-93535 TSM
    
    
    |MDVREGR 7.1.3|
    

Local fix

  • Monitor SHOW CONDVAR command output and restart the server
    before it gets to the point of running out of handles.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager server users.                     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in level 7.1.5. Note that this is      *
    * subject to change at the discretion of IBM.                  *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms: AIX, HP-UX, Solaris, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT13678

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    71W

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-02-25

  • Closed date

    2016-02-25

  • Last modified date

    2016-06-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R71A PSY

       UP

  • R71H PSY

       UP

  • R71L PSY

       UP

  • R71S PSY

       UP

  • R71W PSY

       UP



Document information

More support for: Tivoli Storage Manager

Software version: 7.1.3

Reference #: IT13678

Modified date: 10 June 2016