Scheduling data deduplication and node replication processes

Data deduplication and node replication are optional functions that can be used with Tivoli® Storage Manager. They provide added benefits but also require additional resources and consideration for the daily schedule.

About this task

Depending on your environment, using data deduplication and node replication can change the tasks that are required for the daily schedule. If you are using node replication to create the backup copy of your data, then storage pool backups are not needed. Likewise, you do not need to migrate your data to tape storage pools for the creation of offsite backup media.

The following image illustrates how to schedule data deduplication and node replication processes to achieve the best performance. Tasks that overlap in the image can be run at the same time.

Restriction: The amount of duplicate identification processes that can be overlapped is based on the processor capability of the Tivoli Storage Manager server and the I/O capability of the storage pool disk.

Figure 1. Daily schedule when data deduplication and node replication are used

Tasks for protecting client data are explained in the steps. The image shows the timeline for the schedule when node replication and deduplication processes are included: Client backups run from approximately 10 PM to 6 AM. Duplicate identification and database backup overlap, and run from approximately 6 AM to 10 AM. Node replication runs from approximately 10 AM to 4 PM. Expiration runs from 4 PM to 6 PM. Reclamation runs from 6 PM to 10 PM.

The following steps include commands to implement the schedule that is shown in the image. For this example, tape is not used in the environment.

Procedure

Perform an incremental backup of all clients on the network to a deduplicated file storage pool by using the incremental client command or use another supported method for client backup.
You can run the following tasks in parallel:
1. Perform server-side duplicate identification by running the IDENTIFY DUPLICATES command. If you are not using client-side data deduplication, this step processes data that was not already deduplicated on your clients.
2. Create a disaster recovery (DR) copy of the Tivoli Storage Manager database by running the BACKUP DB command. In addition, run the BACKUP VOLHISTORY and BACKUP DEVCONFIG commands to create DR copies of the volume history and device configuration files.
Perform node replication to create a secondary copy of the client data on another Tivoli Storage Manager server by using the REPLICATE NODE command. By performing node replication after duplicate identification processing, you can take advantage of data reduction during replication.
Remove objects that exceed their allowed retention by using the EXPIRE INVENTORY command.
Reclaim unused space from storage pool volumes that are released through data deduplication and inventory expiration by using the RECLAIM STGPOOL command.