Tuning server-side data deduplication

Tune settings and configuration for different operations to ensure that the performance of server-side data deduplication is efficient.

Procedure

  1. Control processor resources by setting the number of duplicate identification processes that you want to use. Do not exceed the number of processor cores available on your Tivoli® Storage Manager server when you set the NUMPROCESS value. Define a duration limit for the IDENTIFY DUPLICATES command, otherwise, processes that are running after the command is issued run indefinitely.
  2. Determine the threshold for reclamation of a deduplicated storage pool. A deduplicated storage pool is typically reclaimed to a threshold that is less than the default of 60 to allow more of the identified duplicate extents to be removed. Experiment with the setting of this value to find a threshold that can be completed within the available time.
  3. Determine how many reclamation processes to run.
    Tip: A reclamation setting of more than 25 and less than 40 is sufficient.
  4. Schedule data deduplication processing that is based on how you create a second copy of your data. If you are backing up your storage pool, do not overlap client backup and duplicate identification. Complete the storage pool backup before the identify process. If the storage pool backup is not complete, the copy process takes longer because it requires the deduplicated data to be reassembled before the backup.
    You can overlap duplicate identification and client backup operations in the following scenarios:
    • You are not backing up your storage pool.
    • You are using node replication to create a secondary copy of your data.
    Running these operations together can reduce the time that is required to finish processing, but might increase the time for client backup.
  5. To prevent deadlocks in the Tivoli Storage Manager server, you might need to modify the DB2® LOCKLIST parameter before you deduplicate a large amount of data. When the amount of concurrent data movement activity is high, deadlocks can occur in the server. If the amount of concurrent data that is moved exceeds 500 GB at a time, adjust the DB2 LOCKLIST parameter as follows:
    Table 1. Tuning DB2 LOCKLIST parameter values
    Amount of data LOCKLIST parameter value
    500 GB 122000
    1 TB 244000
    5 TB 1220000