To maximize compression, have an optimal compression dictionary by loading the most representative data during load operations.
By default, InfoSphere CDC uses the fast load apply mode to perform a refresh operation on a target table. In this mode, InfoSphere CDC bulk loads data into the target table with the db2Load API. During this refresh of your shadow tables, load the full row-organized table in a single bulk load operation.
The fastload_refresh_commit_after_max_operations system parameter controls the size of each bulk load operation on the target table and identifies the number of rows that comprise each transaction during the refresh. If the number of rows to be loaded is larger than a value of the InfoSphere CDC fastload_refresh_commit_after_max_operations parameter, InfoSphere CDC splits the refresh into multiple load operations and periodically commits the changes to the target database. When data is being loaded into a column-organized table, the process starts with the ANALYZE phase, which is unique to column-organized tables. The column compression dictionary is built during the ANALYZE phase. To ensure the best compression ratio, load as much data as possible before committing the changes.
$ cd <cdc-installation-path>/bin
$ ./dmset -I <cdc-instance-name>
fastload_refresh_commit_after_max_operations=2147483647
$ cd <cdc-installation-path>/bin
$ ./dmset -I <cdc-instance-name>
fastload_refresh_commit_after_max_operations=10000000
You
might need to adjust this value as size of the row-organized tables
grow.