Details: Recovery for remote journaling scenario

This topic describes the details of the recovery process for remote journaling.

These details provide a step-by-step description of the process that occurs in Scenario: Recovery for remote journaling.

At the time of the system failure, the state of JKL and JKLINT is as follows:

  • Journal entries 12-19 are already deposited into PJ1 and confirmed in BJ1.
  • The corresponding data changes are also already reflected in the data replica, DB', on system JKLINT2.
  • Journal entries 20-25 are built and validated in main storage on JKLINT and sent to BJ1, and then system JKLINT fails.
  • Main storage is not preserved when JKLINT fails, so at the time of the failure, the last known confirmed sequence number in BJ1 is 19. Sequence numbers 20 through 25 are all unconfirmed.
  • The last known sequence number in PJ1 will be 19 when system JKLINT restarts.

The hot-backup recovery strategy in these details does not require that both before-images and after-images are journaled to the local journal. However, the strategy would require before-images if, during the resynchronization process of the switch-back to the primary system, the strategy requires that the hot-backup application remove journaled changes.

To recover system JKLINT, the following steps are required:
  1. Update DB' by using the hot-backup application to replay the unconfirmed journal entries.
    1. On system JKLINT2, allow the hot-backup application apply processing to complete the replay of confirmed operations as identified in journal BJ1. This is the first step of the switch-over processing. The apply processing includes replaying all journal entries up through and including sequence number 19.
    2. The hot-backup application does not replay sequence numbers 20-25 because the I/O for those journal entries is not yet confirmed from the local journal PJ1. The Receive Journal Entry (RCVJRNE) command or Retrieve Journal Entries (QjoRetrieveJournalEntries) API that is being used to retrieve the entries from the remote journal will not return sequence numbers 20-25 to the exit program, unless specifically requested to do so. To specify that sequence numbers 20 - 25 are returned to the exit program, use the INCENT(*ALL) parameter on the command. You can also request this by specifying *ALL for the include entries key on the API.
    3. After the hot-backup application replays all confirmed journal entries, perform a change journal operation to attach a new journal receiver to local journal PJ2 on system JKLINT2 and change the state of journal PJ2 in *ACTIVE state. The change journal operation establishes a clean recovery point. It also makes clear what information needs to be sent back to system JKLINT later to replay back to the original data. Performing the change journal operation also prevents the remote journal function from having to re-replicate all of the journal entries that were previously generated into the currently attached journal receiver of PJ2. (The journal entries were generated into the receiver as part of replaying the database changes to the data replica on system JKLINT2.)

    The following figure shows that more unconfirmed journal entries are present in BJ1 than are known in PJ1.

    This figure shows that more unconfirmed journal entries are present in BJ1 than are known in PJ1.
  2. Perform switch-over processing and prepare JKLINT2 to run applications
    1. The hot-backup application reads unconfirmed journal entries from BJ1 and replays them to the data replica. They are retrieved from BJ1 by using the Receive Journal Entry (RCVJRNE) command or QjoRetrieveJournalEntries API, specifically requesting that unconfirmed journal entries be returned. Journal entries 140-145 are generated into journal PJ2 when replaying these changes to the data replica.
    2. The QjoChangeJournalState API or CHGJRN command inactivates the remote journal BJ1. During this operation, the system physically removes the unconfirmed journal entries from BJ1. The last known sequence number in BJ1 is now 19.
    3. The replay processing on JKLINT2 sends a user entry that indicates the point in time when the database was switched-over. The user entry in the following figure is sequence number 146, journal code 'U', entry type 'SW'.
    4. After these steps are performed on system JKLINT2, applications can now be started on JKLINT2 and use DB' as the database to be updated. Applications continue to work and deposit journal entries 147-200.
    5. System JKLINT restarts and normal IPL recovery finds the end of the journal for PJ1 to be sequence number 19. IPL recovery ensures that all changes up to sequence number 19 are reflected in the original data. The IPL for JKLINT completes with journal PJ1 being left in the *ACTIVE state, as this was the state of the journal when the system failed.

    The following figure shows the state of BJ1, PJ2, and DB' when system JKLINT2 is ready to assume the role of the primary system.

    This figure illustrates switch-over processing. System JKLINT2 is now ready to allow applications to run
  3. Activate remote journal PJ2 and transport journal to JKLINT
    1. After JKLINT restarts, activate the remote journal BJ2. Specify that the process will start with the attached journal receiver on JKLINT2. This starts the transport of journal entries representing the changes made on JKLINT2 as part of replaying the unconfirmed journal entries plus all changes made to DB' while JKLINT was unavailable. While this transfer is progressing (during catch-up processing, which then transitions into synchronous or asynchronous remote journal function mode), changes are still being made by applications to DB'.
    2. Either before or during the transport of journal entries to BJ2, send and make known the last known sequence number in BJ1 (19) to the hot-backup application apply. This can be included as information in the SW user journal entry.
    3. The hot-backup application backs-out changes that are known to PJ1 (after the last known sequence number in BJ1) from the original data DB on system JKLINT. For this particular scenario, no changes need to be backed out of the original data.
      Note: For scenarios which require this back-out processing, both before-image and after-image journal entries are required.

    The following figure shows the state of both systems after system JKLINT has completed its IPL. This is after system JKLINT2 has been running as the primary system, but before database DB is resynchronized with DB'. (The database changes represented in PJ2 by journal sequence numbers 147-200 are not shown in DB' for simplicity.)

    This figure illustrates that JKLINT2 assumes the role of the primary, and DB' is now being updated. IPL processing has completed on JKLINT.
  4. Replay changes to DB on JKLINT
    1. The hot-backup application replays the changes back to the original data on system JKLINT. The changes that are replayed include those changes that were made to DB' as part of the switch-over processing. The switch-over processing replayed the data changes for the unconfirmed journal entries (sequence numbers 140-145)). Additional changes include those data changes that were deposited while system JKLINT2 had assumed the role of the primary system (sequence numbers 147-300). Note that changes are still being made to DB' on system JKLINT2 and journal entries are still being generated into local journal PJ2 on system JKLINT2.
    2. When you decide that JKLINT must again assume the role of the primary system, end the applications on JKLINT2. The following figure shows the state of both systems just before system JKLINT is going to assume the role of the primary system.
    3. Allow the remaining changes to be replicated to BJ2. After all changes have been sent to BJ2, you can inactivate BJ2.
    4. After all of the journal entries have been replayed to the original data on JKLINT, attach a new journal receiver to PJ1 to clearly denote a new recovery point.

      The change journal operation is not absolutely essential. However, attaching a new journal receiver to PJ1 at this time makes clear where to start replaying changes back to the data replica on system JKLINT2. Performing the change journal operation also prevents the remote journal function from having to send back all of the journal entries that were previously generated into the currently attached journal receiver of PJ1. (The journal entries were generated in the receiver as part of replaying the data changes back to the original data on system JKLINT.)

    The following figure shows the state of the journals and data just before starting to replay the changes back to the original data DB.

    This figure shows the state of the journals and data just before starting to replay the changes back to the original data DB.
  5. Allow JKLINT to again assume role of the primary system
    1. Application programs can now make changes to the original data DB on system JKLINT.
    2. When you determine that it is time to start replicating the changes made on the primary system to the backup system, you can activate the remote journal BJ1.

      When activating the remote journal, you can indicate to send journal entries starting with the attached journal receiver on the source system. If this occurs, then only those journal entries that are required to be replayed to the data replica will be sent to system JKLINT2.

      Note: You can start with the attached receiver, only if you did the change journal to attach a new receiver that was mentioned in step 4.
    3. If you want the complete chain of journal receivers from system JKLINT on JKLINT2, when you activate the remote journal, indicate to start with the attached journal receiver as known to the remote journal, BJ1. This will complete the sending of the journal receiver that contains the IPL entry (sequence number 20). The process will then move on to the next journal receiver that contains the journal entries where the hot-backup application apply will start replaying changes to the data replica. An alternative to that approach is to save and restore the detached journal receiver to system JKLINT2.
    4. You change the state of local journal PJ2 on system JKLINT2 to *STANDBY state.
    5. After local journal PJ2 has put in *STANDBY state, perform a change journal operation to attach a new journal receiver to PJ2.

      The change journal operation is not absolutely essential. However, attaching a new journal receiver to PJ2 at this time makes clear where the replaying of changes back to the data replica started on system JKLINT2. Performing the change journal operation also avoids the remote journal function from having to later send all of these hot-backup application apply generated journal entries back to system JKLINT.

      The newly attached journal receiver contains journal entries that will not have to be sent back to system JKLINT.

    6. After the operation is performed, the hot-backup application apply can be started on system JKLINT2 to start replaying changes to the data replica. The hot-backup application apply starts with the source system sending the newly attached journal receiver.

The following figure shows that JKLINT is preparing again assume the role of the primary system.

This figure shows that JKLINT is preparing again assume the role of the primary system.