IBM Support

Hub for Troubleshooting Server Database Backup Performance Degradation

Troubleshooting


Problem

Hub for Troubleshooting Server Database Backup Performance Degradation

Resolving The Problem

Overview:

The Tivoli Storage Manager (IBM Spectrum Protect) server uses several different components to complete BACKUP DB processing of its internal DB2 database (TSMDB1). Although the BACKUP DB process is initiated via the Server process, it uses the Tivoli Storage Manager client API and the DB2 database API to complete the operation.

In most cases, performance diagnostics are integral for determining the root cause of the performance degradation. Base-line documentation can be collect before engaging Support using the data-collection outlined in the following TechNote: Collecting Data for IBM Spectrum Protect: Server BACKUP DB performance.

The following list, however, contains common known causes of BACKUP DB performance degradation that should be investigated:

Symptom Index:

Slow overall throughput of BACKUP DB:


$$_TSMDBMGR_$$ sessions slow to start after BACKUP DB initiated:
BACKUP DB throughput slows (or may hang) after time elapses:
Problem
Details:

Conflicts with online table or index reorganization:
Symptoms:
Slow overall BACKUP DB processing and/or BACKUP DB takes a long time to start once initiated.
Problem:
Misconfigured or long-running reorganization activity.
Recommendation:
Ensure that online table or index reorganization is not happening during your BACKUP DB window. Ensure that the server options REORGBEGINTIME and REORGDURATION are set appropriately to avoid your BACKUP DB window.
Related Links:
Tivoli Storage Manager 6.3.0.000 -> 7.1.1.1XX: How do I set reorganization options to optimize performance?
IBM Spectrum Protect 7.1.1.200 +: How do I set reorganization options to optimize performance?

Significantly fragmented database tables or indexes:
Symptoms:
Slow overall BACKUP DB processing and/or questionably large database size.
Problem:
Incomplete reorganization activity.
Recommendation:
Ensure that online table and index reorganization is occurring. Consider utilizing the analyze_DB2_formulas.pl perl script to determine if reorganization is required (review the summary.out file generated by this script). If reoganization is required on a table or index, determine why automatic online index reorganization is not being successful. This may happen if a table or index has already recently been reorganized (with the previous 20 days for tables, or 7 days for an index), or because reorg activity has been explicitly disabled (review your reorganization settings and compare them against the TechNote's in the Related Links section below), or because the table/index is too large to be successfully processed online.

Note that IBM Spectrum Protect level 7.1.1.300 or newer is recommended for users experiencing significant database growth and/or an inability to successfully complete online index reorganization. A new feature was added in this level to help address these issues.
Related Links:

Out-of-date table statistics:
Symptoms:
Slow overall BACKUP DB processing.
Problem:
Out-of-date table statistics.
Recommendation:
In general, if table/index reorganization is completing successfully, the table statistics will be up to date. If table/index reorganization is not completing successfully (see above), then that needs to be fixed first. If table/index reorganization activity is happening successfully, but table statistics are still out of date, they may need to be manually updated. IBM Support should be engaged to determine if the table statistics are significantly out of date.
Related Links:

Slow database reads or output media writes:

Symptoms:
BACKUP DB to is consistently slow to any type of output media.
Problem:
Slow database reads from disk, or slow writes to output media (disk or tape)
Recommendation:
Review iostat, perfmon, nmon, nfsstat data to determine if the input or output media is performing at expected rates. In general, database reads should be happening on average no slower than 5 millisecond (ms) per read. The backup should be writing to output media on average for disk at no slower than 5ms per write, or 2ms per write for tape. If the performance does not meet these guidelines, review with your storage administrator and make any changes necessary to improve the read/write performance of the subsystem.
Related Links:

Improper mount options on filesystems:
Symptoms:
BACKUP DB to is consistently slow to any type of output media.
Problem:
Slow database reads from disk caused by improper mount options.
Recommendation:
On AIX, all components (DB, archive log, active log, and storage pools) should be using the following mount options: rbrw, inline logs, and noatime. On all platforms, the database should NOT be using directio (DIO). Review with your system administrator to ensure the above considerations are in place.
Related Links:
Tuning AIX systems for Tivoli Storage Manager server performance

Outputting to physical tape on the AIX platform can experience degradation due to a lack of 64KB pages:
Symptoms:
BACKUP DB to real physical tape on AIX always runs slow.
Problem:
Server does not use 64KB pages on AIX, but should.
Recommendation:
If you are using Tivoli Storage Manager/IBM Spectrum Protect on AIX and writing database backups to physical tape, the server process should be configured to use 64KB pages. This change was first introduced in IBM Spectrum Protect 7.1.3.000, so upgrading to that level or higher is the recommendation.
Related Links:

Database bufferpool flush algorithm improvements:
Symptoms:
$$_TSMDBMGR_$$ sessions take a long time to start after initiating BACKUP DB
Problem:
Bufferpools can take a long time to flush and delays backup start.
Recommendation:
Before a BACKUP DB can begin on a particular table, DB2 must flush the contents of that table's bufferpool. Prior to IBM Spectrum Protect 7.1.3.000, the server completed this activity on all tables at the start of the backup. Beginning in 7.1.3.000, the bufferpool is flushed for only the table being actively processed. This allows the BACKUP DB operation to start streaming data to an output media faster. This also provides some overall throughput improvements as well.
Related Links:
IT11163: DELAY BETWEEN COMMAND ISSUE AND DATA TRANSFER FOR BACKUP DB

Database bufferpools contain a large number of changed (dirty) pages:
Symptoms:
$$_TSMDBMGR_$$ sessions take a long time to start after initiating BACKUP DB
Problem:
Bufferpools can take a long time to flush and delays backup start.
Recommendation:
If IBM Spectrum Protect 7.1.3.000+ has already been applied (to take advantage of the algorithm changes referenced above), but the database backup continue to take a long time to begin, there may be a large number of changed (dirty) pages in the bufferpools. At this level, a new server option called CHNGPGSTHRESH was introduced to allow a user to control the percentage of dirty pages that remain in the bufferpools. Consider lowering the value of this option. The default is 40, but it can be lowered as low as 5. This option does require a server restart.
Related Links:

Database bufferpool flush tuning:
Symptoms:
BACKUP DB runs okay for a while, but then performance degrades (Q PROC stops reporting forward progress in bytes processed), and/or BACKUP DB can take a long time to start after initiating.
Problem:
Bufferpools can take a long time to flush and delays backup operations. The buffer may be too large.
Recommendation:
Despite the changes implemented in 7.1.3.000 (see above) to address bufferpool flushes, users reported the slowness/hangs shifted from the beginning of the backup, to some point after the backup has started and written data to output media. You may also notice a long delay before data begins streaming to media after BACKUP DB has started. To address this, APAR IT14336 was taken to leverage a new DB2 feature to better control how large the bufferpools grow to be. Upgrade to fixing maintenance once available, or contact IBM Support for other options in the interim.
Related Links:
IT14336: SERVER DATABASE BACKUP PERFORMANCE AFFECTED BY DB2 FLUSH ACTIVITY AS MANAGED WITH OPTIONS SOFTMAX AND PAGE_AGE_TRGT_MCR

Empty pages within a tablespace:
Symptoms:
BACKUP DB runs okay for a while, but then performance degrades (Q PROC stops reporting forward progress in bytes processed).
Problem:
Too many empty pages in the tablespace.
Recommendation:
If table/index reorganizations are completing successfully, it may be normal for there to be a number of empty pages within a particular tablespace. These empty pages, however, must be scanned during a database backup. The more empty pages that exist, the more unnecessary scanning must take place during the database backup operation. The operational throughput can be improved by not scanning these empty pages. There are several considerations/options available to address this, so IBM Support should be engaged to verify which option is the best fit for your environment.
Related Links:

Background file-based device class deduplication deference work is triggering lock escalations:
Symptoms:
BACKUP DB runs okay for a while, but then performance degrades (Q PROC is reporting slow forward progress in bytes processed). Users may also witness inconsistent BACKUP DB durations day to day.
Problem:
Lock escalations in DB2 are slowing transaction speeds and/or causing slow volume allocation (particularly for FILE based device classes).
Recommendation:
Ensure that your locklist is properly defined. Review APAR IT04946 for more details on identification and resolution. Depending on the amount of backlog, increasing the locklist may not be sufficient, in which case applying fixing maintenance is the only option. Otherwise, apply fixing maintenance when available.
Related Links:
IT04946: IN CERTAIN CIRCUMSTANCES THE BFDEREFQUEUETHREAD THREAD CAN HOLD AN EXCESSIVE NUMBER OF LOCKS.

Active log exhaustion leads to multi-stream failures:
Symptoms:
Multi-stream BACKUP DB runs okay for a while, experiences a failure, and restarts using one stream. Users may also notice Q PROC lists more bytes processed for the backup than the size of the database.
Problem:
The filesystem containing the active log runs out of free space, causing multi-stream database backups to fail with ANR2993E with -2428 sqlcode, and restart using a single stream (which reduces parallelism and performance). Note that archive logs are retrieved to the active log file system during a database backup operation, so the active log filesystem must have free space to satisfy those retrievals.
Recommendation:
Ensure that your active log filesystem has adequate free space. In general, this filesystem should contain about 20% free space for temporary log movement.
Related Links:
Multi-stream BACKUP DB can fail with ANR2993E and restart with one stream.
BACKUP DB fails with ANR2993E and sqlcode: -2428

[{"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Server","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Supported Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Product Synonym

ITSM TSM ADSM IBM SPECTRUM PROTECT

Document Information

Modified date:
17 June 2018

UID

swg21976378