TSM Client Open File Support (OFS)
This TSM Client Open File Support (OFS) technote outlines current limitations and known problems and lists steps that may help to diagnose problems in the setup and use of this feature. The OFS feature was added in V5.2, but V5.3 is strongly recommended.
Resolving the problem
TSM Client Open File Support
The TSM Client Open File Support (OFS) feature is documented in the "Backing Up Your Data" section of the Windows Client Installation and User's Guide. This technote outlines current limitations and known problems and lists steps that may help to diagnose problems in the setup and use of this feature.
When using the OFS feature, IBM strongly recommends that you update the client to the latest version to receive any currently known fixes. When running the OFS feature, the client should be at least at the V5.3 fix pack 4 (5.3.4) level.
This document applies to the V5.3 client and above, which is the recommended level for running OFS. If you are running the V5.2 client level and cannot migrate yet to V5.3, refer to "TSM Client V5.2 Open File Support"http://www-1.ibm.com/support/docview.wss?rs=663&context=SSGSG7&q1=ofs+V5.2&uid=swg21121552&loc=en_US&cs=utf-8&lang=en for V5.2-specific information.
What is Open File support?
It is possible for applications to create files and open these files in such a way to deny access to all other processes on a Microsoft Windows operating system. Although this is not a common practice, it is sometimes used by database vendors or other applications that may want to limit access to certain files. By restricting access to these files, backup products, including the Backup Utility that is native to Windows 2000, are prevented from backing up this data. These "locked" files are NOT the same as files which are "open" or "in use". TSM, running without the OFS feature, can backup "open" or "in use" files, including files that are open for reading or writing, files that are changing during the backup, executable and dll files that are running, log files that are being appended to, etc.
The following is the error message that is seen in the dsmerror.log when a TSM backup encounters one of these "locked" files without OFS support enabled:
ANS4987E Error processing '\\machine1\d$\dir1\lockedfile.xyz': the object is in use by another process
ANS1228E Sending of object '\\machine1\d$\dir1\lockedfile.xyz' failed
It is not necessary to use OFS for backing up locked Windows system files. The TSM Client has advanced features for backing up data contained within these files. The backup of the system data that is contained in these files requires additional processing and must be backed up in a group to allow for a successful restore. The included TSM Client features "System Objects" (Windows 2000 and Windows XP) and "System State" (Windows Server 2003) are used for the backup and restore of this data. These files are automatically "excluded" from the TSM File level backup.
For applications like databases that use certain files for transactional consistency (for example a recovery log file) it may not be possible to backup and restore these files without database coordination. In these situations these database files should not be backed up with the normal TSM file level backup, and either an exclude or exclude.dir option should be used to bypass these files. TSM provides a number of Data Protection clients (TSM for Databases, TSM for Mail, TSM for Application Servers, etc.) which provide this database coordination and backup along with other advanced database backup features. For a current list of Data Protection clients go to http://www-03.ibm.com/software/products/en/tivoli-storage-manager-family.
If this is a private application or other database product where a Data Protection client is not available, then the PRESCHEDULECMD option which is set in the client options file (dsm.opt) can be used to signal the database or application to do one of the following:
1. Take the steps necessary to move these files to a consistent and unopen state.
2. Bring down the database before the file level backup is started.
3. Program or script another method to back up this data and exclude these files from the file level backup.
In these cases the new OFS feature is not necessary since these files are no longer unavailable or "locked" by the application. Once the file level backup has completed, the POSTSCHEDULECMD option can then be used to bring the database back on line or restart the application.
If the time it takes to complete the file level backup is too long to have the open files off-line (for example, having the database off-line or holding up transactions), then the new OFS feature can be used to create a point-in-time snapshot of the volume. In this case, the PRESNAPSHOTCMD and POSTSNAPSHOTCMD options can be used to signal the database or application to coordinate with the backing up of these open files. The snapshot, which is the time between the pre and post snapshotcmd, should only take a few seconds to create. This allows the database or application to resume operations quickly while still allowing TSM to perform a full incremental backup of the volume - including the locked files.
There are other situations where these application "locked files" can be safely backed up and restored on a file by file basis. In these situations, OFS can be enabled for that volume where the open files exist. TSM file level backup will then have access to these files and back them up using the TSM file level backup and archive operations.
Please see the restrictions and known issues below for further details about implementing OFS.
The OFS feature can be selected at Install time or can be installed later using the TSM Client GUI Setup Wizard. The default is to not install the OFS feature. The install program and the Setup Wizard will perform all the steps necessary to install, setup and enable the feature so that the next backup or archive operation will attempt to take advantage of the open file support. The install program and the Setup Wizard can also be used to update or remove the feature. The install or removal of this feature will require a machine reboot. If there are any problems with a backup using the OFS feature on a volume an error will be logged in the dsmerror.log and Windows Event Log and in most cases the backup will fail-over to the normal non-OFS mode of backup which is the same traditional backup mode as if the OFS feature was not enabled for that volume.
IBM strongly recommends that you take the default OFS configuration setting for the snapshotcachelocation. The default is for the cache to reside on the same volume that is being snapped. See related OFS options below.
TSM Open File Support restrictions:
1.Microsoft Terminal Services.
There is known limitation in Microsoft Terminal Services server on Windows 2000 that prevents the OFS feature from working over a Microsoft Terminal Services session. The following errors are recorded to the dsmerror.log during the snapshot creation process:
ANS1327E The snapshot operation for 'D:' failed. Error code: 673.
ANS1375E The snapshot operation failed.
ANS1376E Unable to perform operation using a point-in-time copy of the filesystem. The backup/archive operation will continue without snapshot support.
To get around this limitation the TSM Remote Web GUI can be used to perform the backup operation. This limitation does not effect TSM Scheduled backups which run locally nor will it effect any Restore operations.
The Microsoft Terminal Services limitation can be seen outside of the TSM Client by logging on to a Terminal Service session and creating a new volume. Using the Windows Disk Management tool under Administrative Tools create and format a new volume. The volume is created successfully, but the new volume will not be seen in Windows Explorer or the Windows Command Prompt unless the user logs off of the current Terminal Service session and re-establishes a new Terminal Service session. This limitation is not seen with the Remote Desktop feature that is included with Windows XP and above.
Other known OFS issues
IBM strongly recommends that you take the default OFS configuration setting for the snapshotcachelocation. The default is for the cache to reside on the same volume that is being snapped.
In most cases, the space needed for the snapshot cache will not exceed the available space on a drive. By default, only one percent of the used space is allocated for the cache file on a volume that is being snapped (SNAPSHOTCACHESIZE=1). One percent of the used space is usually sufficient because when data on a volume is changed a) only one copy of the original data is saved if that data is changed multiple times and b) data from the 'unused' block list of a volume is not saved. In other words, if a single file is changed multiple times during the snapshot, it is only necessary to save the original data once. Also, for new files or files that are growing, for example log files, the new data is most often satisfied from the 'unused' block list of a volume, in which case it is not necessary to save the original data. In those cases that require a larger cache, the SNAPSHOTCACHESIZE option can be used to change the default of one percent.
In unusual cases, where the cache can not be located on the volume being snapped (cache size is larger than the available free space on a volume), it is possible to place the cache onto a different volume using the INCLUDE.FS SNAPSHOTCACHELOCATION option. This will allow for a larger cache size, but requires additional overhead. If the cache must be relocated, it is highly recommended that it be placed on a drive that is not part of the same snapshot backup operation.
2. File system IO activity.
The TSM snapshot feature will wait for a file system to be idle for a given period to help ensure a consistent point-in-time view of the filesystem. This is controlled through the SNAPSHOTFSIDLEWAIT and SNAPSHOTFSIDLERETRIES options. Given that NTFS flushes I/O in bursts, there should be idle time adequate to take a snapshot between NTFS filesystem cache flushes. TSM also requests a filesystem flush before a snapshot is taken to further increase the chance of a consistent filesystem snapshot.
TSM uses the SNAPSHOTFSIDLEWAIT option with a max wait time and a min wait time, in conjunction with the SNAPSHOTFSIDLERETRIES option, to wait for idle time in the filesystem before the snapshot is taken. An example with SNAPSHOTFSIDLEWAIT 6,1 and SNAPSHOTFSIDLERETRIES 5 waits for i/o inactivity for 6 seconds, then retries 5, 4, 3, 2, and finally 1 second. If the last attempt fails then the snapshot fails, and a normal, non-OFS backup will occur.
TSM uses a SNAPSHOTFSIDLEWAIT default value of 5 seconds and a SNAPSHOTFSIDLERETRIES default value of 20. This requires that a period of 5 seconds must pass without write activity (read activity is ignored) on a volume before a snapshot can occur. If write activity is detected, TSM will retry again up to 20 times, decreasing the wait time on each successive attempt. A value of 0 on the last attempt indicates that the snapshot will occur immediately without waiting for a period of inactivity, and will always succeed. The value of 0 on the last attempt will guarantee that the snapshot will succeed, but may increase the chance that there could be a problem reading a file from the snapshot during the backup. If the file cannot be read, a warning of the file read problem will appear in the dsmerror.log as a failed backup file, and the file will be backed up during the next incremental backup. If this occurs often, then the SNAPSHOTFSIDLEWAIT min option should be increased; for example 500ms which will wait .5 seconds. Although a value of 5 seconds on the first attempt may seem high, most backups will be scheduled during low user and filesystem activity time (for example overnight). This, with the retries, should allow for a successful snapshot to occur.
Any temporary burst of write activity (for example a very large file download) should be overcome by the retries or during the next scheduled backup. However, the following errors may occur if persistent I/O activity is seen on the drive:
ANS1327E The snapshot operation for 'objectname' failed. Error code: 672
ANS1380E The snapshot operation failed. The filesystem write activity prevented the Logical Volume Snapshot Agent from satisfying the SNAPSHOTFSIDLEWait and SNAPSHOTFSIDLERetries options.
ANS1376E Unable to perform operation using a point-in-time copy of the filesystem. The backup/ archive operation will continue without snapshot support.
If this I/O activity is persistent, and the snapshot attempts are failing, then it is recommended that the SNAPSHOTFSIDLERETRIES be increased and/or the SNAPSHOTFSIDLEWAIT min value be lowered, or set to 0 to always succeed.
If adjusting the SNAPSHOTFSIDLEWAIT and SNAPSHOTFSIDLERETRIES values still do not allow a snapshot to take place, this may indicate that an application is using that file for sustained I/O operations (for example a database product recovery log file). In this case database consistency may be an issue and it is advised that the pre and post snapshotcmd options be used to notify the application to suspend I/O while the snapshot is taking place. The snapshot should occur in a very short amount of time, so the delay between the PRESNAPSHOTCMD and POSTSNAPSHOTCMD is only the time to establish the snapshot and not the entire backup. The PRESCHEDULECMD/POSTSCHEDULECMD can also be used to avoid the need for OFS support if the application can be brought down during the length of time needed for the backup.
OFS related options
The following options can be set for all volumes or can be set per volume using the INCLUDE.FS option:
SNAPSHOTCACHELOCATION, SNAPSHOTCACHESIZE, SNAPSHOTFSIDLERETRIES, SNAPSHOTFSIDLEWAIT, PRESNAPSHOTCMD, POSTSNAPSHOTCMD