Improving TSM Restore Performance From Tape

Technote (troubleshooting)


Problem(Abstract)

Most customers take advantage of TSM’s incremental forever strategy, which only sends files that have changed on the client machine to the TSM server during each back up, saving bandwidth, time and tape cartridges. Over time as files slowly change on client machines data will not be packed closely together on the tape volumes since active data will be mixed in with inactive data, expired data and data from other nodes/filespaces that are not relevant to a particular restore operation. Restore of a single directory becomes even more problematic since data is even sparser.

Initially performance is quite good, however over time as the data is spread out over a tape or multiple tapes performance becomes degraded. The degradation is caused by the increase in tape locate commands that are used to get to the valid data and perhaps additional tape mounts. Tape drives perform well when they are allowed to stream, reading or writing sequential data, but when forced to skip around on the tape the locate commands begin to dominate. In following sections we will discuss how to avoid the problem.

Resolving the problem

There are three basic restore scenarios:

· Single file restore
· Multiple file (directory) restore
· Total file system restore (disaster recovery)

TSM is already optimized for the first case since TSM keeps records of the exact location of each file. The second case is the most problematic since that directory (or set of files) may be spread throughout a tape or set of tapes over time (initially this will be fast.) The third case can be optimized if image backup/restore is utilized.

So, depending on your priorities you may need to amend your backup strategy to include image backup and perhaps some occasional selective backups. Details on other ways to improve general restore performance follow.

There are several methods that can help avoid the problem of slow restore. Many of these items can and should be used together, whereas others are mutually exclusive (and will be noted as such.)

You may also wish to reference the Optimizing Restore Operations for Clients section in the IBM Tivoli Storage Manager Administrator's Guide.


TSM Collocation

TSM Collocation (when enabled) reserves a volume or set of volumes (if more than one is needed) for a particular TSM client, filespace or group of clients. This allows filespace data to be more closely packed together and avoids excessive tape mounts. It’s possible that dozens of tapes could be needed for restore of a single filespace or even a directory with collocation disabled.

Collocation can be enabled for nodes, groups of nodes, or filespaces.

Drawbacks to using collocation include increased tape cartridge use and the fact that multi-session restore may not be used effectively since data may reside on a single tape volume. However using collocation by group can eliminate these drawbacks if groups are chosen wisely such that the total quantity of data stored by a particular group is able to fill several tape volumes.

TSM Multi-session Restore

Multi-session restore allows TSM to restore data from multiple tape volumes simultaneously if the desired data resides on multiple tapes, increasing performance.

If collocation by node or filespace is being used client data may only reside on one tape, eliminating the possibility of using multi-session restore.

Run TSM Tape Reclamation

Be sure that tape reclamation is running periodically (storage pool reclaim < 100%.) By freeing up unused areas of tape reclamation makes data more compact on the tape volume, providing better restore performance.

Begin Multiple TSM Restore Commands

When restoring multiple filespaces start one restore command for each filespace to allow them to happen simultaneously. This has the greatest effect if the node’s data is collocated by filespace.

Utilize TSM Image Backup

Image backup can be used to take a snapshot at the disk device level, which provides high back up and restore speeds since individual files need not be processed. Image backup is most often integrated with the incremental forever strategy. Incremental backups are performed often and image backups performed less often, depending on how much the data is changing (if most of the data is changing between image backups the benefit of image restore becomes less to none.)

Upon a full filesystem failure the image can be restored at great speed and then the files that changed between the time that image was taken and the last incremental are then restored. This has the advantage of allowing the drive to stream during the image restore and has the added advantage that the files restored since the image are likely to be closely packed together (since they were backed up relatively recently.)

Perform Periodic Full Selective Backups of Filesystems

Similar to image backup, a full selective backup of a filesystem has the effect of putting the filesystem data in one place, which provides faster restores. This is the single best way to optimize restore of a subset of a filesystem (perhaps a directory) when using tape devices.

Periodic selective backups have the drawback of sending all the data when only a fraction may have changed and can interfere with expiration/retention policies.

To minimize the impact of backing up all the data during a particular backup the selective backups for different filespaces/nodes can be staggered throughout the week.

To avoid interfering with expiration/retention policies the selective backups can be performed under a different node name. When a restore of a directory or whole filesystem is needed, the last selective can be restored under the alternate node name, and then any changed files since that selective can be restored from the incremental backups, similar to what is done for image backup, except a single directory restore is possible. For restore of an entire filesystem, image backup/restore is superior.

Utilize Virtual Mount Points

When filesystems become huge virtual mount points can be used to make a single filesystem appear as many to TSM. This is only possible if the directory structure is somewhat static and balanced. Through collocation you can then ensure that different parts of the filesystem go to different tapes allowing a more optimized restore.

In Disaster Recovery Situations Use “Move Data” or “Move Nodedata” Commands To Stage Data To Disk

If a DR situation has arisen and time will be needed to physically get the client system back online data may begin to be restored to a disk storage pool from primary tape pool volumes so that when the client system becomes available much, or all, of the data can be restored from disk.

Prioritize Importance of Nodes

Important nodes can be put in a separate storage pool that uses tape collocation, utilizes disk as main storage, or uses other features that you may not be willing to use for all clients.

Utilize Active Data Pools

Active data pools allow your most recent data to reside on disk while older, inactive data may be stored on tape. Most restores are of the active data and if that data is in an active data pool on disk restores may be much faster.

Know When To Use No Query Restore vs. Classic Restore

TSM utilizes two different methods when determining what needs to be restored based on the restore specification used, NQR and classic restore.

NQR is invoked when a simple wildcard, which matches an entire directory, is used such as:

/home/user/*

and when the options “inactive”, “latest”, “pick”, “fromdate”, and “todate” are not used. A restricted wildcard that matches a subset of files in a directory such as:

/home/user/*.txt

will invoke classic restore.

The difference becomes important because the two types have different performance characteristics. When restoring an entire filesystem, NQR will be superior. However when restoring a single directory classic restore can be faster in some situations. Classic restore can be invoked in these situations by using a (trick) slightly different wildcard such as:

/home/user/?*

Classic restore can also be invoked by using the testflag DISABLENQR. If small restores are taking too much time, classic restore may provide better performance. The difference becomes more pronounced for large filesystems with millions of files.


Rate this page:

(0 users)Average rating

Document information


More support for:

Tivoli Storage Manager
Server

Software version:

All Supported Versions

Operating system(s):

Platform Independent

Software edition:

All Editions

Reference #:

1142185

Modified date:

2010-06-01

Translate my page

Machine Translation

Content navigation