Flow of change data in a CDC Transaction stage job

To understand how the CDC Transaction stage works, you should understand how data flows from the source database to the target database.

The following image shows how data flows when IBM® InfoSphere® Change Data Capture (InfoSphere CDC) captures changes at a source database and uses IBM InfoSphere DataStage® to deliver the change data to a target database.

The image is described by the surrounding text.

  1. On the computer where the source database is installed, the InfoSphere CDC service for the database monitors and captures the change.
  2. InfoSphere CDC transfers the change data according to the replication definition.
  3. The InfoSphere CDC for InfoSphere DataStage server sends data to the CDC Transaction stage through a TCP/IP session that is created when replication begins. Periodically, the InfoSphere CDC for InfoSphere DataStage server also sends a COMMIT message (along with bookmark information) to mark the transaction boundary in the captured log.
  4. In the InfoSphere DataStage job, the data flows over links from the CDC Transaction stage to the target database connector stage. The bookmark information is sent over a bookmark link. For each COMMIT message sent by the InfoSphere CDC for InfoSphere DataStage server, the CDC Transaction stage creates end-of-wave (EOW) markers that are sent on all output links to the target database connector stage.
  5. The target database connector stage connects to the target database and sends data over the session. When the target database connector stage receives an end-of-wave marker on all input links, it writes bookmark information to a bookmark table and then commits the transaction to the target database.
  6. Periodically, the InfoSphere CDC for InfoSphere DataStage server requests bookmark information from a bookmark table on the target database. In response to the request, the CDC Transaction stage fetches the bookmark information through ODBC and returns it to the InfoSphere CDC for InfoSphere DataStage server.
  7. The InfoSphere CDC for InfoSphere DataStage server receives the bookmark information, which is used for the following purposes:
    • To determine the starting point in the transaction log where changes are read when replication begins. (The starting point in the transaction log is the ending point from the previous replication, if the replication ended successfully.)
    • To determine if the existing transaction log can be cleaned up.
    The bookmark is committed synchronously with the data, so even if the job fails, the bookmark information and the written data are consistent. If the job fails, replication begins at the point that is indicated by the bookmark, and there is no loss of data.