DB2 10.5 for Linux, UNIX, and Windows

Compression dictionaries and shadow tables

To maximize compression, have an optimal compression dictionary by loading the most representative data during load operations.

For shadow tables, you can do the following techniques to optimize your compression dictionaries for shadow tables:

Ensure that the source tables are well populated

If your source tables are large before you start mirroring, the initial InfoSphere® CDC load operation to refresh the shadow table should be large enough to create a good column compression dictionary.

Maximize the size of the bulk load refresh

By default, InfoSphere CDC uses the fast load apply mode to perform a refresh operation on a target table. In this mode, InfoSphere CDC bulk loads data into the target table with the db2Load API. During this refresh of your shadow tables, load the full row-organized table in a single bulk load operation.

The fastload_refresh_commit_after_max_operations system parameter controls the size of each bulk load operation on the target table and identifies the number of rows that comprise each transaction during the refresh. If the number of rows to be loaded is larger than a value of the InfoSphere CDC fastload_refresh_commit_after_max_operations parameter, InfoSphere CDC splits the refresh into multiple load operations and periodically commits the changes to the target database. When data is being loaded into a column-organized table, the process starts with the ANALYZE phase, which is unique to column-organized tables. The column compression dictionary is built during the ANALYZE phase. To ensure the best compression ratio, load as much data as possible before committing the changes.

To employ this strategy, set the value of the fastload_refresh_commit_after_max_operations parameter to the maximum setting of 2147483647 as shown in the following example:

$ cd <cdc-installation-path>/bin 
$ ./dmset -I <cdc-instance-name>
fastload_refresh_commit_after_max_operations=2147483647

Alternatively, you can set the fastload_refresh_commit_after_max_operations parameter to the row count of the table with most rows out of those tables that are shadowed. For example, if you shadow three row-organized tables and table one has 1 million rows, table two has 2 million rows, and table three has 10 million rows, then set this parameter to 10 million, as follows:

$ cd <cdc-installation-path>/bin 
$ ./dmset -I <cdc-instance-name>
fastload_refresh_commit_after_max_operations=10000000

You might need to adjust this value as size of the row-organized tables grow.