Tool to identify incorrect backups or archives on TSM Server due to APAR IT11994

Troubleshooting

Problem

During IBM Tivoli Storage Manager (TSM) backup or archive operations using client-side deduplication, under certain conditions when the TSM server has little data storage space remaining, a file can be incorrectly backed up or archived by the TSM client with no error message for that file. The restore of an affected backup or archive copy will fail with an error such as “digest validation error”, “unknown format”, or a crash. The data that was intended to be backed up or archived in the copy being restored may not be recoverable. This technote provides guidance for identifying backup or archived objects that are affected by this problem.

Cause

See “Problem” section of this document (above).

Resolving The Problem

Before proceeding with the techniques described in this document, you should be familiar with the information in the corresponding flash and the APAR.

Flash: http://www.ibm.com/support/docview.wss?uid=swg21966336
APAR IT11994

There are two methods to identify affected objects. One method is for the container storage pools which were introduced with IBM Spectrum Protect version 7.1.3. The other method is for the other “legacy” storage pool types. Both methods use SQL queries that run on the Tivoli Storage Manager or IBM Spectrum Protect server DB2 database. Use either or both methods, depending on which type of storage pools you have in your environment.

Table of Contents:
Legacy pools
Container pools
Restore affected objects

Legacy pools

Run Perl script runqueriesIT11994.pl (included with this document) on your Tivoli Storage Manager Server (6.3 or 7.1) or IBM Spectrum Protect Server. The script can run for a very long time so:

Consider running the script on only a few nodes at a time using the 'nodelist' option below.
If you prefer to use the 'all' option below, consider restoring a copy of the server database to a dedicated system, and running the script on that system. Follow the first two steps in this documentation to restore a copy of the server database to another system: http://www-01.ibm.com/support/knowledgecenter/SSSR2R_6.3.0/com.ibm.itsm.srv.doc/t_db_move_new_loc.html

To use the runqueriesIT11994.pl script tool, note the following:

Perl script usage:

perl runqueriesIT11994.pl -nodelist input_file_name
or:
perl runqueriesIT11994.pl -all

Options:

-all
This option runs the query on all nodes and file spaces. Each node is processed in turn, with output displayed after processing for the node finishes. This is the default behavior if no option is specified.

-nodelist input_file_name
This option runs the query on only the nodes and file spaces specified in the input file that you create. Each node and file space combination is processed in turn, with output displayed after processing for that node and file space combination. You might use this option if you want to limit processing only to critical nodes and file spaces. For example, if you create an input file named critical.nodes, use this command to start the script:

perl runqueriesIT11994.pl -nodelist critical.nodes

The input file must be created with a plain text editor. Do not use a word processor. Each line in the input file must use one of the syntaxes show below. It is not necessary that each line use the same syntax.

Input line syntax	Description
node_name file_space_name	Specifies a fully qualified node name and a fully qualified file space name. Wild cards are not permitted. Separate the node name and file space name with a blank space.
node_name *	Specifies a fully qualified node name and an asterisk (*) to indicate all file spaces for that node. Separate the node name and asterisk with a blank space.
node_name	Specifies a fully qualified node name to indicate all file spaces for that node. This is functionally equivalent to the preceding syntax.

Example: Create an input file named myinputfile.txt to process one file spaces for node corp01, all file spaces for node corp02, one file space for dataprod1, and all file spaces for node starfish.

Input file contents:

corp01 /Volumes/SVT1
corp02 *
dataprod1 \\dataprod1\e$
starfish

Perl script command:

perl runqueiesIT11994.pl -nodelist myinputfile.txt

The script will create an output file named queriesIT11994.log in the same directory as where the Perl script runqueriesIT11994.pl is located. This log file will contain information about each query that was processed for each node name and file space combination in the input file.

Based on the example input file from above, here is a sample of the queriesIT11994.log file.

*****************************************************************
Script execution began at Mon Oct 26 16:33:05 2015
*****************************************************************

*****************************************************************

STAGE 1: Determine which nodes/file spaces from the input file need to be queried

*****************************************************************

Node dataprod1, file space \\dataprod1\e$ is an API file space, not affected by this APAR

No file spaces found under nodename corp02

*****************************************************************

STAGE 2: Run sql queries for nodes/file spaces determined as needing to be queried in stage 1

*****************************************************************

Running sql queries for node corp01 and file space /Volumes/SVT1 . . .

No affected objects found under node corp01, file space /Volumes/SVT1

Running sql queries for node starfish and file space \\starfish\e$ . . .

1 affected objects found under node starfish, file space \\starfish\e$. See file queryresults_6236_25

At least one affected object was found;
Please see log queriesIT11994.log for details

*****************************************************************
Script execution ended at Mon Oct 26 16:33:55 2015
*****************************************************************

If any affected objects are found, the query results file will list the object ids of the affected files under the corresponding node and file space, each preceded by show invo. The contents of the query results file thus consists of a sequence of one or more show invo commands which can then be issued on the administrative client, dsmadmc, to retrieve the file name as well as other information associated with the object id in question. The query result file names have this format:

queryresults_nodeid_fsid

where nodeid is the numeric ID for the node and fsid is the numeric ID for the file space. The Perl script will also create this file in the same directory as where the Perl script runqueriesIT11994.pl is located.

From the queriesIT11994.log example file from above, we have queryresults_6236_25. This query results file contains one show invo command for an affected object:

$ cat queryresults_6236_25
show invo 83857838

When this show invo command is issued on the administrative client, dsmadmc, you can see that the content is for the node STARISH and file space is \\starfish\e$. This file shows one affected object, \\starfish\e$\dedupebug\test\c.txt. This is an active backup copy from August 27, 2015 16:46:31.

tsm: ARC1>show invo 83857838
Inventory object 83857838 of copy type Backup has attributes:
NodeName: STARFISH, File space(25): \\starfish\e$,
ObjName: \DEDUPEBUG\TEST\C.TXT.
hlID: B1D7D84783A8935FFD85FDDC5C5750268B24DB00
llID: CAD95D92FE5063E8BD9CC7208BCD3BA98DF97591
Type: 2 (File) MC: 1 (DEFAULT) CG: 1 Size: 1049600 HeaderSize: 396
Active, Inserted 08/27/15 16:46:31 (UTC 08/27/15 23:46:31)
GroupMap 00000000, bypassRecogToken NULL, flags 0008

Bitfile Object: 83857838
**Super-bitfile 83857838 contains following aggregated bitfiles,
Bitfile Id, offset, length, active state or owner, link bfid
83857838 0 611 Active
83857839 611 134439 83857838
83857840 135050 845241 83857838
83857841 980291 134439 83857838
83857842 1114730 845241 83857838
83857843 1959971 134439 83857838
83857844 2094410 845241 83857838
….

Windows TSM Server

On a Windows TSM Server, you will need to log on as the server instance owner and then run the Perl script as the user that owns the TSM server instance in a normal windows command line terminal. By default, Microsoft does not include a Perl interpreter as part of the Windows operating system. You can install a Perl interpreter, such as Strawberry Perl or ActiveState Perl before launching the Perl script.

Unix TSM Server

On a Unix TSM Server, you will need to run the Perl script as the user that owns the TSM server instance.

Container pools

Run the two queries below to check for affected backup and archive objects in IBM Spectrum Protect Server 7.1.3 or higher container pools. These queries are not expected to take long to run, so there is no need to run them on a restored copy of the IBM Spectrum Protect server database. The query output will consist of a sequence of one or more show invo commands for any affected objects. These show invo commands can be issued from the administrative client, dsmadmc, to retrieve the file name as well as other information associated with the object id in question.

Backup objects

db2 "select 'show invo ' || cast (imbk.objid as char(24)) from tsmdb1.backup_objects imbk where exists (select fsname from tsmdb1.file spaces imfs where imfs.nodeid=imbk.nodeid and imfs.fsid=imbk.fsid and imfs.fstype not like 'API:%') and BITAND( imbk.flags, CAST(8 AS SMALLINT) )>0 and (imbk.bfsize+imbk.hdrsize+imbk.metadatasize) < (select max( sdro.offset ) from tsmdb1.sd_recon_order as sdro where (sdro.objid=imbk.objid)) for read only with ur" > badobjects.out

Archive objects

db2 "select 'show invo ' || cast (imbk.objid as char(24)) from tsmdb1.archive_objects imbk where exists (select fsname from tsmdb1.file spaces imfs where imfs.nodeid=imbk.nodeid and imfs.fsid=imbk.fsid and imfs.fstype not like 'API:%') and BITAND( imbk.flags, CAST(8 AS SMALLINT) )>0 and (imbk.bfsize+imbk.hdrsize+imbk.metadatasize) < (select max( sdro.offset ) from tsmdb1.sd_recon_order as sdro where (sdro.objid=imbk.objid)) for read only with ur" > badobjects.out

Windows TSM Server

On a Windows TSM Server, you will need to log on to your Windows TSM Server as the server instance owner and then run the SQL query in a DB2 Command Window after connecting to tsmdb1.

Find and start the program called DB2 Command Window.
In that window, type 'db2 connect to tsmdb1' and hit enter. It will show the connection information (the exact db2 version information may vary).

For example,

C:\PROGRA~1\Tivoli\TSM\db2\BIN>db2 connect to tsmdb1

Database Connection Information

Database server = DB2/NT64 10.5.5
SQL authorization ID = ADMINIST...
Local database alias = TSMDB1

After this, copy and paste the SQL query and run it. The query output will appear in the file, badobjects.out, once it completes.

Unix TSM Server

On a Unix TSM Server, you will need to log on to the TSM UNIX or Linux Server as the server instance owner and connect to tsmdb1 before you can run the SQL query.

Once you have logged in as the correct user, issue the command 'db2 connect to tsmdb1'.

For example,

$ db2 connect to tsmdb1

Database Connection Information

Database server = DB2/AIX64 10.5.1
SQL authorization ID = TSMINST1
Local database alias = TSMDB1

$

After this, copy and paste the SQL query at the command prompt following the Database Connection Information and hit Enter to run it. The query output will appear in the file, badobjects.out, once it completes.

Restore or Retrieve affected objects

Objects can be restored or retrieved to an alternate location and then examined to see if the original data was affected but can be recovered. To ensure restore or retrieve of affected objects, use the RESTOREALL and SKIPDATAVALIDATION testflags in the client options file.

Options file syntax:
TESTFLAGS RESTOREALL SKIPDATAVALIDATION

In this example, we are trying to restore c.txt from node name starfish and file space \\starfish\e$ and the above two testflags are enabled in the client options file:

dsmc restore e:\dedupebug\test\c.txt e:\dedupebug\test\salvage\c.txt

Examine the restored or retrieved copy of c.txt and, if possible, compare it to the original c.txt. If the restored or retrieved copy is affected by this problem, then some or all of the data might be invalid.

If a restored copy is invalid, or you are not sure if it is valid, then because this is an active backup copy, consider performing a selective backup of the original c.txt (using a client version with the fix for IT11994) to make a new backup copy.
If a retrieved copy is invalid, check whether the original c.txt or a valid backup version of c.txt is available to re-archive with a client version containing the fix for IT11994
If you need further assistance, contact IBM Support and mention IT11994.

Change History
29 October 2015: Original text published

runqueriesIT11994.pl

[{"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Client","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF014","label":"iOS"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"6.2;6.3;6.4;7.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Tips

Tool to identify incorrect backups or archives on TSM Server due to APAR IT11994

Troubleshooting

Problem

Cause

Resolving The Problem

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?