DB2 10.5 for Linux, UNIX, and Windows

Recovering data by using HADR delayed replay

Using the HADR time-delayed replay feature, you can recover data that was lost because of an errant transaction on the primary database by stopping HADR on a standby before that transaction is replayed.

Before you begin

Delayed replay must have already been enabled for your standby database.

If log replay on the standby, indicated by STANDBY_REPLAY_LOG_TIME, has passed the commit time for the errant transaction on the standby, you cannot recover the data using the following procedure. You can determine the STANDBY_REPLAY_LOG_TIME by using the db2pd command with the -hadr parameter or the MON_GET_HADR table function.

Restriction: A standby database for which you set the hadr_replay_delay configuration parameter cannot take over as a primary; you must first disable delayed replay on that standby.

Procedure

To recover from an errant transaction, perform the following steps on the standby on which you enabled delayed replay:

Verify the timing:
1. Ensure that standby has not yet replayed the transaction. The STANDBY_REPLAY_LOG_TIME value must not have reached the errant transaction commit time.
2. Ensure that the standby has received the relevant logs. The STANDBY_LOG_TIME value, which indicates logs received, must have reached a PIT before the errant transaction commit time, but close to the errant transaction commit time. This will be the rollforward PIT used in step 3. If the standby has not yet received enough log files, you can wait until more logs are shipped over, but you run the risk of the replay time reaching the errant transaction time. For example, if the delay is 1 hour, you should stop HADR no later than 50 minutes after the errant transaction time (allowing a 10-minute safety margin), even if log shipping has yet not reached the PIT that you want.
  
  Alternatively, if a shared log archive is available and the logs are already archived, then there is no need to wait. If the logs are not archived yet, the logs can be archived using the ARCHIVE LOG command. Otherwise, the user can manually copy complete log files from the primary to the time-delayed standby (the overflow log path is preferred, otherwise, use the log path). For these alternate methods, deactivate the standby first to avoid interference with standby log shipping and replay.
You can determine these times by issuing db2pd -db dbname -hadr or by enabling the reads on standby feature on the standby and then issuing the following query, which uses the MON_GET_HADR table function:
```
DB2 "select HADR_ROLE, STANDBY_ID, STANDBY_LOG_TIME, STANDBY_REPLAY_LOG_TIME, 
varchar(PRIMARY_MEMBER_HOST,20) as PRIMARY_MEMBER_HOST, 
varchar(STANDBY_MEMBER_HOST,20) as STANDBY_MEMBER_HOST 
from table (mon_get_hadr(NULL))"
```
Stop HADR on the standby database:
```
 DB2 STOP HADR ON DATABASE dbname
```
Roll forward the standby to the PIT that you want and then stop:
```
DB2 ROLLFORWARD DB dbname to time-stamp and STOP
```
Use one of the following approaches:
- Restore the lost data on the primary:
  1. Copy the affected data from the standby and send it back to the primary.
    
    If the errant transaction dropped a table, you could export it on the standby and import it to the primary. If the errant transaction deleted rows from a table, you could export the table on the standby and use an import replace operation on the primary.
  2. Reinitialize the delayed-replay standby because its log stream has diverged from the primary's. No action is needed on any other standbys because they continue to follow the primary and any data repair on the primary is also replicated to them.
  3. Restore the database using a backup image taken on the primary. The image can be one taken at any time.
  4. Remove all log files in standby log path. This step is important. The ROLLFORWARD... STOP command in step 3 made the database log stream diverge from the primary. If the files are left alone, the newly restored database would follow that log stream and also diverge from the primary. Alternatively, you can drop the database before the restore for a clean start, but then you will also lose the current configuration including HADR configuration.
  5. Issue the START HADR command with the AS STANDBY option on the database. The database should then activate and connect to the primary.
- Have the standby with the intact data become the primary:
  1. Shut down the old primary to avoid split brain
  2. On the delayed-replay database, set the hadr_replay_delay configuration parameter to 0. Reconfigure the other parameters like hadr_target_list if needed. Then run START HADR command with the AS PRIMARY BY FORCE options on the database to convert it to the new primary. Use the BY FORCE option because there is no guarantee that the configured principal standby (which could be the old primary) will be able to connect.
  3. Redirect clients to the new primary.
  4. The other standbys will be automatically redirected to the new primary. However, if a standby received logs from the old primary beyond the point where old and new primary diverge (the PIT used in step 3), it will be rejected by the new primary. If this happens, reinitialize this standby using the same procedure as reinitializing the old primary.
  5. Reinitialize the old primary because its log stream has diverged from the new primary's.
  6. Restore database using a backup image taken on the new primary, or taken on the old primary before the PIT used in step 3.
  7. Remove all log files in the log path. If you do not do this, the newly restored database will follow the old primary's log stream and diverge from the new primary. Alternatively, you can drop the database before the restore for a clean start, but then you also lose the current configuration including HADR configuration.
  8. Issue the START HADR command with the AS STANDBY option on the database. The database should then activate and connect to the primary.