IBM Support

IC99949: NDMP BACKUP SESSION HANGS CAUSING SUBSEQUENT NDMP BACKUP SCHEDULES TO MISS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • There is a timing issue at the end of NDMP filer-to-server
    backups or restores that may cause the session to hang if the
    NDMP operation failed.  If the NDMP operation was launched by
    an administrative schedule, then the schedule will never
    complete, causing subsequent schedule to be missed.
    
    The issue does not seem to present itself if the operation
    completes successfully.
    
    When the problem occurs, we can validate that this is the case
    when a schedule is missed, and the schedule appears in the
    output of  SHOW PENDING  and the deadline is in the past.
    
    L2 diagnostic:
    When looking at trace with CC, ADM, SCHED, we see the
    following after the backup job failed, but before the duration
    elapsed:
    17:31:03.039 [136][cscmdsch.c][774][CsCmdSchedulerThread]:
       Command Scheduler: Schedule NDMP_SCHED started.
    17:31:03.040 [579962][csmgr.c][1842][CsUpdateEvent]:
             Schedule : NDMP_SCHED
    17:31:03.045 [579962][csmgr.c][1954][CsUpdateEvent]:
             Schedule : NDMP_SCHED
    17:31:03.046 [579962][cscmdsch.c][964][CsRunCmdThread]:
       Schedule NDMP_SCHED has already been run - cannot start
       again
    17:31:03.048 [136][cscmdsch.c][474][CsCmdSchedulerThread]:
       Command Scheduler: Skipping schedule NDMP_SCHED.
    
    After the schedule duration is over, we see the following loop
    for the remaining period of the trace, including the next
    schedule time:
    18:30:19.244 [136][cscmdsch.c][502][CsCmdSchedulerThread]:
       Command Scheduler: Schedule NDMP_SCHED is expired, but
       still running.
    18:30:19.261 [136][cscmdsch.c][502][CsCmdSchedulerThread]:
       Command Scheduler: Schedule NDMP_SCHED is expired, but
       still running.
    
    For a backup job run manually, show threads should still have
    hanging threads for the backup showing:
    ===========================================================
     Thread 55, ID 2672 (0x0a70): AfStoreNativeThread
     Parent=0, result=0, joining=0, detached=1, zombie=0,
        session=0
     Waiting for Cond 50941240 (&sessP->ndmp_ssDoneCond) using
        mutex 0 (na). Waiting from
    
     Stack trace:
     00000000772A186A NtWaitForMultipleObjects()+a
     000007FEFD501430 GetCurrentProcess()+40
     0000000077141220 WaitForMultipleObjects()+b0
     000007FED76A1C5A pkWaitCondition()+3ea pkmonnt.c:1507
     000007FED8458861 ssEndSession()+511 sssess.c:935
     000007FED7995FC2 bfEndSession()+5f2 bfutil.c:1865
     000007FED7B48749 DoEndSess()+99 afremote.c:2913
     000007FED7B49224 AfStoreNativeThread()+334 afremote.c:3504
     000007FED769A73F startThread()+35f pkthread.c:3249
     000007FEE6077175 beginthreadex()+205
     000007FEE6077377 endthreadex()+1d7
     000000007714652D BaseThreadInitThunk()+d
     000000007727C541 RtlUserThreadStart()+21
    ===========================================================
     Thread 56, ID 10632 (0x2988): ShowThreadController
     Parent=40, result=0, joining=0, detached=0, zombie=0,
        session=0
    
    Versions Affected: All versions on all platforms
    Initial Impact: medium
    Additional Keywords: ANR1893E tsm
    

Local fix

  • To release the schedule, perform one of the following:
    - restart the TSM Server
    - delete the schedule and redefine it
    
    To avoid the issue, address the problem that is causing the
    operation to fail.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager server users that perform NDMP    *
    * operations                                                   *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in levels 6.3.5 and 7.1.1. Note that   *
    * this is subject to change at the discretion of IBM.          *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms:  AIX, HP-UX, Solaris, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC99949

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    63A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-03-10

  • Closed date

    2014-04-29

  • Last modified date

    2014-04-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R63A PSY

       UP

  • R63W PSY

       UP

  • R63S PSY

       UP

  • R63H PSY

       UP

  • R63L PSY

       UP

  • R71A PSY

       UP

  • R71S PSY

       UP

  • R71H PSY

       UP

  • R71W PSY

       UP

  • R71L PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"63A","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
29 April 2014