Resource waiter aborts during client backup

Troubleshooting

Problem

Client backups to a disk storage pool may fail due to resource waiter abort: ANR0538I A resource waiter has been aborted.

Cause

This behavior may be seen in environments where clients backup to a cache-enabled primary disk storage pool and the data is subsequently migrated to a dedup-enabled FILE storage pool. When there is an insufficient amount of storage space available in the disk storage pool to accommodate the client backup, the server will begin to reclaim storage space by deleting cached files. Resource contention may occur if an attempt is made to delete a cached file that is being accessed in the dedup-enabled FILE storage pool. For example, if a file in the dedup-enabled FILE storage pool is being accessed by a space reclamation or identify duplicates process, the corresponding cached version of the same file cannot be deleted until the transaction in the FILE storage pool has completed and the lock on the file has been released. If the requested lock cannot be acquired in the amount of time specified by the RESOURCETIMEOUT parameter, then the client backup session will be terminated.

Diagnosing The Problem

Prior to the client backup session being terminated due to the resource waiter abort, the output from the QUERY SESSION command will likely show the session in a Run state, though no data transfer is occurring:

The output from the SHOW SESSIONS command allows us to determine the thread id (2330167) corresponding with this session (2159868):

2159868

2330167

The SHOW RESQ command can then be issued to confirm whether resource contention is occurring on the Tivoli Storage Manager server. The output from the SHOW RESQ command may show the client backup session thread waiting to acquire a resource, as seen below:

2382444736

46001

2330167

From this output we can see that thread 2330167 has been waiting 52 minutes to acquire a lock (lock type 46001) on a resource (2382444736). The output from the SHOW LOCKS command provides additional information about the lock and the transaction holding this lock:

460

2382444736

2329514

In the output above we can see that lock type 46001 is an aggregate, or superbitfile, lock and the resource name of '2382444736' represents the aggregate id. This lock is currently held by transaction 0:261047415 which is associated with thread 2329514. The SHOW THREADS command shows that thread 2329514 is associated with a space reclamation process:

2329514

We know that the client backup session is attempting to acquire a lock on a bitfile, but the bitfile is currently locked by the space reclamation process. We now need to review the call stack associated with the client backup thread to confirm that the lock contention is occurring because the server is attempting to delete cached data. The SHOW THREADS command can be issued to display the call stacks for the various threads, but on some platforms it may be necessary to issue the pstack/procstack command on the operating system to obtain this information. In either case, the call stack associated with the client backup thread should resemble the following:

This call stack indicates that the client session is attempting to write data to the disk storage pool (DfCreate), but the server must first free up additional space in the storage pool by deleting cached data (DfReclaimCached).

Resolving The Problem

This problem behavior is most likely to affect large objects, as the larger the object size, the greater the amount of time required to reclaim or deduplicate the object. As such, the resource timeouts occurring as a result of this condition can likely be avoided by performing either of the following actions:

b) specifying a MAXSIZE value for the disk storage pool to prevent large files from being backed up directly to disk

[{"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Server","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.3;7.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Product Synonym

tsm

Was this topic helpful?

Document Information

Modified date:
17 June 2018

UID

swg21666632

Tips