If you deduplicate data, you must consider its effects on space requirements for active and archive logs.
The following factors affect requirements for active and archive log space:
250,000 extents identified during each process x 1,500 bytes
for each extent = 358 MB
60,000,000 extents x 1,500 bytes for each extent = 84 GB
8192 extents in each aggregate x 1500 bytes for each extent =
12 MB
12 MB for each process x 10 processes = 120 MB
1,200,000 extents x 1,500 bytes for each extent = 1.7 GB
If other, smaller duplicate-identification processes occur at the same time as the duplicate-identification process for a single large object, the active log might not have enough space. For example, suppose that a storage pool is enabled for deduplication. The storage pool has a mixture of data, including many relatively small files that range from 10 KB to several hundred KB. The storage pool also has few large objects that have a high percentage of duplicate extents.
To take into account not only space requirements but also the timing and duration of concurrent transactions, increase the estimated size of the active log by a factor of two. For example, suppose that your calculations for space requirements are 25 GB (23.3 GB + 1.7 GB for deduplication of a large object). If deduplication processes are running concurrently, the suggested size of the active log is 50 GB. The suggested size of the archive log is 150 GB.
The examples in the following tables show calculations for active and archive logs. The example in the first table uses an average size of 700 KB for extents. The example in the second table uses an average size of 256 KB. As the examples show, the average deduplicate-extent size of 256 KB indicates a larger estimated size for the active log. To minimize or prevent operational problems for the server, use 256 KB to estimate the size of the active log in your production environment.
Item | Example values | Description | |
---|---|---|---|
Size of largest single object to deduplicate | 800 GB | 4 TB | The granularity of processing for deduplication is at the file level. Therefore, the largest single file to deduplicate represents the largest transaction and a correspondingly large load on the active and archive logs. |
Average size of extents | 700 KB | 700 KB | The deduplication algorithms use a variable block method. Not all deduplicated extents for a given file are the same size, so this calculation assumes an average size for extents. |
Extents for a given file | 1,198,372 bits | 6,135,667 bits | Using the average extent size (700 KB),
these calculations represent the total number of extents for a given
object. The following calculation was used for an 800 GB object: (800 GB ÷ 700 KB) = 1,198,372 bits The following calculation was used for a 4 TB object: (4 TB ÷ 700 KB) = 6,135,667 bits |
Active log: Suggested size that is required for the deduplication of a single large object during a single duplicate-identification process | 1.7 GB | 8.6 GB | The estimated active log space that are needed for this transaction. |
Active log: Suggested total size | 66 GB 1 | 79.8 GB 1 | After considering other aspects of the workload
on the server in addition to deduplication, multiply the existing
estimate by a factor of two. In these examples, the active log space
required to deduplicate a single large object is considered along
with previous estimates for the required active log size. The following calculation was used for multiple transactions and an 800 GB object: (23.3 GB + 1.7 GB) x 2 = 50 GB Increase that amount by the suggested starting size of 16 GB: 50 + 16 = 66 GB The following calculation was used for multiple transactions and a 4 TB object: (23.3 GB + 8.6 GB) x 2 = 63.8 GB Increase that amount by the suggested starting size of 16 GB: 63.8 + 16 = 79.8 GB |
Archive log: Suggested size | 198 GB 1 | 239.4 GB 1 | Multiply the estimated size of the active
log by a factor of 3. The following calculation was used for multiple transactions and an 800 GB object: 50 GB x 3 = 150 GB Increase that amount by the suggested starting size of 48 GB: 150 + 48 = 198 GB The following calculation was used for multiple transactions and a 4 TB object: 63.8 GB x 3 = 191.4 GB Increase that amount by the suggested starting size of 48 GB: 191.4 + 48 = 239.4 GB |
1 The example values
in this table are used only to illustrate how the sizes for active
logs and archive logs are calculated. In a production environment
that uses deduplication, 32 GB is the suggested minimum size for an
active log. The suggested minimum size for an archive log in a production
environment that uses deduplication is 96 GB. If you substitute values
from your environment and the results are larger than 32 GB and 96
GB, use your results to size the active log and archive log. Monitor your logs and adjust their size if necessary. |
Item | Example values | Description | |
---|---|---|---|
Size of largest single object to deduplicate | 800 GB | 4 TB | The granularity of processing for deduplication is at the file level. Therefore, the largest single file to deduplicate represents the largest transaction and a correspondingly large load on the active and archive logs. |
Average size of extents | 256 KB | 256 KB | The deduplication algorithms use a variable block method. Not all deduplicated extents for a given file are the same size, so this calculation assumes an average extent size. |
Extents for a given file | 3,276,800 bits | 16,777,216 bits | Using the average extent size, these calculations
represent the total number of extents for a given object. The following calculation was used for multiple transactions and an 800 GB object: (800 GB ÷ 256 KB) = 3,276,800 bits The following calculation was used for multiple transactions and a 4 TB object: (4 TB ÷ 256 KB) = 16,777,216 bits |
Active log: Suggested size that is required for the deduplication of a single large object during a single duplicate-identification process | 4.5 GB | 23.4 GB | The estimated size of the active log space that is required for this transaction. |
Active log: Suggested total size | 71.6 GB 1 | 109.4 GB 1 | After considering other aspects of the workload
on the server in addition to deduplication, multiply the existing
estimate by a factor of 2. In these examples, the active log space
required to deduplicate a single large object is considered along
with previous estimates for the required active log size. The following calculation was used for multiple transactions and an 800 GB object: (23.3 GB + 4.5 GB) x 2 = 55.6 GB Increase that amount by the suggested starting size of 16 GB: 55.6 + 16 = 71.6 GB The following calculation was used for multiple transactions and a 4 TB object: (23.3 GB + 23.4 GB) x 2 = 93.4 GB Increase that amount by the suggested starting size of 16 GB: 93.4 + 16 = 109.4 GB |
Archive log: Suggested size | 214.8 GB 1 | 328.2 GB 1 | The estimated size of the active log multiplied
by a factor of 3. The following calculation was used for an 800 GB object: 55.6 GB x 3 = 166.8 GB Increase that amount by the suggested starting size of 48 GB: 166.8 + 48 = 214.8 GB The following calculation was used for a 4 TB object:
Increase that amount by the suggested starting size of 48 GB: 280.2 + 48 = 328.2 GB |
1 The example values
in this table are used only to illustrate how the sizes for active
logs and archive logs are calculated. In a production environment
that uses deduplication, 32 GB is the suggested minimum size for an
active log. The suggested minimum size for an archive log in a production
environment that uses deduplication is 96 GB. If you substitute values
from your environment and the results are larger than 32 GB and 96
GB, use your results to size the active log and archive log. Monitor your logs and adjust their size if necessary. |