IBM Support

IT19064: SERVER CRASH OCCURS IN NARROW TIMING WINDOW PROCESSING THROUGH SDASYNCWRITE DURING CONTAINER POOL INGEST

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Under very rare circumstances users of container pools may
    encounter a crash of the server during ingest operations
    (client backup/archive or target server Replicate Node or
    Protect Stgpool). The call stacks for the failing threads in
    core dumps generated by the crash will show processing
    through SdAsyncWrite. The following two examples have been
    seen for this issue. Other variations are possible.
    global_lock_ppc_mp() at 0x9000000004f4fb4
    pthread_cond_broadcast(??) at 0x9000000004ed160
    pkBroadcastCondition(??) at 0x100008e1c
    DequeueVarQueue(??, ??, ??, ??, ??) at 0x1003411b8
    SdAsyncWrite(??) at 0x100a3996c
    AsyncWriteThread(??, ??) at 0x100a4116c
    PcConsumerThread(??) at 0x10082349c
    StartThread(0x0) at 0x10000d670
    pkAcquireMutexTracked(0x1b23, 0x1b23, 0xbc5) at 0x100007870
    IsTxnFlushRequired(??) at 0x1009fdf74
    SdAsyncWrite(??) at 0x1009fcc90
    AsyncWriteThread(??, ??) at 0x1009c64ac
    PcConsumerThread(??) at 0x10093dde4
    StartThread(0x0) at 0x10000da90
    While not occurring in all cases the following messages may
    be seen in the server actlog and db2diag.log prior to the time
    of the crash for some of the timing windows exposed to this
    issue:
      [Activity Log]
      ANR0171I tbcli.c(2158): Error detected on 222:6, database in
                              evaluation mode.
      ANR0102E tbcli.c(2187): Error 4522 inserting row in table
                              "SD.Recon.Order".
      ANR0102E sdcreate.c(3512): Error 4522 inserting row in table
                                 "SD.Recon.Order".
      ANR0171I tbrsql.c(1446): Error detected on 221:8, database in
                               evaluation mode.
      ANR0106E sddedup.c(2565): Unexpected error 4522 fetching row
                                in table "SD.Chunk.Locations".
      ANR0106E sddedup.c(2565): Unexpected error 9991 fetching row
                                in table "SD.Chunk.Locations".
      ANR0171I sdrefcount.c(320): Error detected on 81:16, database
                                  in evaluation mode.
     [db2diag.log]
      2016-09-16-07.11.31.990430-300 E8566035A737   LEVEL: Error
      PID     : 4129188              TID : 4710     PROC : db2sysc
      INSTANCE: tsminst1             NODE : 000     DB   : TSMDB1
      HOSTNAME: ubtsms01
      EDUID   : 4710                 EDUNAME: db2loggr (TSMDB1) 0
      FUNCTION: DB2 UDB, data protection services,
                sqlpScanTranTableForLowTran, probe:550
      MESSAGE : ADM1541W  Application "dsmserv" with application
                handle "0-24095" and application id
                "*LOCAL.tsminst1.160916105945" executing under
                authentication id "TSMINST1" has been forced off
                of the database for violating database
                configuration parameter NUM_LOG_SPAN (current value
                "231"). The unit of work will be rolled back
    Spectrum Protect Versions Affected:
      Spectrum Protect Server 7.1.3+
    Customer/L2 Diagnostics (If Applicable)
      N/A
    Initial Impact: Low|Medium|High
      Low
    

Local fix

  •   If the NUM_LOG_SPAN message is seen then increasing the active
      log size to avoid this failure will keep this trigger for the
      failure from occurring. Although others still exist.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All Tivoli Storage Manager and IBM Spectrum Protect server   *
    * users of container storage pools.                            *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in levels 7.1.7.200, 7.1.8, and 8.1.1. *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    
    Affected platforms:  AIX, Solaris, Linux, and Windows.
    Platforms fixed: AIX, Solaris, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT19064

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    71A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-02-01

  • Closed date

    2017-02-07

  • Last modified date

    2017-02-07

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1.3","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
07 February 2017