AIX operating systems HP-UX operating systems Linux operating systems Mac OS X operating systems Oracle Solaris operating systems Windows operating systems
IBM Tivoli Storage Manager, Version 7.1

Client-side data deduplication

Data deduplication is a method of reducing storage needs by eliminating redundant data.

Overview

Two types of data deduplication are available on Tivoli® Storage Manager: client-side data deduplication and server-side data deduplication.

Client-side data deduplication is a data deduplication technique that is used on the backup-archive client to remove redundant data during backup and archive processing before the data is transferred to the Tivoli Storage Manager server. Using client-side data deduplication can reduce the amount of data that is sent over a local area network.

Server-side data deduplication is a data deduplication technique that is done by the server. The Tivoli Storage Manager administrator can specify the data deduplication location (client or server) to use with the DEDUP parameter on the REGISTER NODE or UPDATE NODE server command.

Enhancements

With client-side data deduplication, you can:
  • Exclude specific files on a client from data deduplication.
  • Enable a data deduplication cache that reduces network traffic between the client and the server. The cache contains extents that were sent to the server in previous incremental backup operations. Instead of querying the server for the existence of an extent, the client queries its cache.

    Specify a size and location for a client cache. If an inconsistency between the server and the local cache is detected, the local cache is removed and repopulated.

    Note: For applications that use the Tivoli Storage Manager API, the data deduplication cache must not be used because of the potential for backup failures caused by the cache being out of sync with the Tivoli Storage Manager server. If multiple, concurrent Tivoli Storage Manager client sessions are configured, there must be a separate cache configured for each session.
  • Enable both client-side data deduplication and compression to reduce the amount of data that is stored by the server. Each extent is compressed before it is sent to the server. The trade-off is between storage savings and the processing power that is required to compress client data. In general, if you compress and deduplicate data on the client system, you are using approximately twice as much processing power as data deduplication alone.

    The server can work with deduplicated, compressed data. In addition, backup-archive clients earlier than V6.2 can restore deduplicated, compressed data.

Client-side data deduplication uses the following process:

Benefits

Client-side data deduplication provides several advantages:

Client-side data deduplication has a possible disadvantage. The server does not have whole copies of client files until you back up the primary storage pools that contain client extents to a non-deduplicated copy storage pool. (Extents are parts of a file that are created during the data-deduplication process.) During storage pool backup to a non-deduplicated storage pool, client extents are reassembled into contiguous files.

By default, primary sequential-access storage pools that are set up for data deduplication must be backed up to non-deduplicated copy storage pools before they can be reclaimed and before duplicate data can be removed. The default ensures that the server has copies of whole files at all times, in either a primary storage pool or a copy storage pool.

Important: For further data reduction, you can enable client-side data deduplication and compression together. Each extent is compressed before it is sent to the server. Compression saves space, but it increases the processing time on the client workstation.

In a data deduplication-enabled storage pool (file pool) only one instance of a data extent is retained. Other instances of the same data extent are replaced with a pointer to the retained instance.

When client-side data deduplication is enabled, and the server has run out of storage in the destination pool, but there is a next pool defined, the server will stop the transaction. The Tivoli Storage Manager client retries the transaction without client-side data deduplication. To recover, the Tivoli Storage Manager administrator must add more scratch volumes to the original file pool, or retry the operation with deduplication disabled.

For client-side data deduplication, the Tivoli Storage Manager server must be Version 6.2 or higher.

Prerequisites

When configuring client-side data deduplication, the following requirements must be met:

The server can limit the maximum transaction size for data deduplication by setting the CLIENTDEDUPTXNLIMIT option on the server. See the Administrator's Guide for details.

The following operations take precedence over client-side data deduplication:

Important: Do not schedule or enable any of those operations during client-side data deduplication. If any of those operations occur during client-side data deduplication, client-side data deduplication is turned off, and a message is written to the error log.

The setting on the server ultimately determines whether client-side data deduplication is enabled. See Table 1.

Table 1. Data deduplication settings: Client and server
Value of the client DEDUPLICATION option Setting on the server Data deduplication location
Yes On either the server or the client Client
Yes On the server only Server
No On either the server or the client Server
No On the server only Server

Encrypted files

The Tivoli Storage Manager server and the backup-archive client cannot deduplicate encrypted files. If an encrypted file is encountered during data deduplication processing, the file is not deduplicated, and a message is logged.
Tip: You do not have to process encrypted files separately from files that are eligible for client-side data deduplication. Both types of files can be processed in the same operation. However, they are sent to the server in different transactions.
As a security precaution, you can take one or more of the following steps:
  • Enable storage-device encryption together with client-side data deduplication.
  • Use client-side data deduplication only for nodes that are secure.
  • If you are uncertain about network security, enable Secure Sockets Layer (SSL).
  • If you do not want certain objects (for example, image objects) to be processed by client-side data deduplication, you can exclude them on the client. If an object is excluded from client-side data deduplication and it is sent to a storage pool that is set up for data deduplication, the object is deduplicated on server.
  • Use the SET DEDUPVERIFICATIONLEVEL command to detect possible security attacks on the server during client-side data deduplication. Using this command, you can specify a percentage of client extents for the server to verify. If the server detects a possible security attack, a message is displayed.


Feedback