Using the HADR time-delayed replay feature, you can recover
data that was lost because of an errant transaction on the primary
database by stopping HADR on a standby before that transaction is
replayed.
Before you begin
Delayed replay must have already been enabled for your
standby database.
If log replay on the standby, indicated by
STANDBY_REPLAY_LOG_TIME, has passed the commit time for the errant
transaction on the standby, you cannot recover the data using the
following procedure. You can determine the STANDBY_REPLAY_LOG_TIME
by using the
db2pd command with the
-hadr parameter
or the MON_GET_HADR table function.
Restriction: A standby
database for which you set the hadr_replay_delay configuration
parameter cannot take over as a primary; you must first disable delayed
replay on that standby.
Procedure
To recover from an errant transaction, perform the following
steps on the standby on which you enabled delayed replay:
- Verify the timing:
- Ensure that standby has not yet replayed the transaction. The
STANDBY_REPLAY_LOG_TIME value must not have reached the errant transaction
commit time.
Ensure that the standby has received the relevant logs. The
STANDBY_LOG_TIME value, which indicates logs received, must have reached
a PIT before the errant transaction commit time, but close to the
errant transaction commit time. This will be the rollforward PIT used
in step 3. If the standby has not yet received enough log files, you
can wait until more logs are shipped over, but you run the risk of
the replay time reaching the errant transaction time. For example,
if the delay is 1 hour, you should stop HADR no later than 50 minutes
after the errant transaction time (allowing a 10-minute safety margin),
even if log shipping has yet not reached the PIT that you want.
Alternatively,
if a shared log archive is available and the logs are already archived,
then there is no need to wait. If the logs are not archived yet, the
logs can be archived using the ARCHIVE LOG command.
Otherwise, the user can manually copy complete log files from the
primary to the time-delayed standby (the overflow log path is preferred,
otherwise, use the log path). For these alternate methods, deactivate
the standby first to avoid interference with standby log shipping
and replay.
You can determine these times by issuing db2pd -db dbname -hadr or
by enabling the reads on standby feature on the standby and then issuing
the following query, which uses the MON_GET_HADR table function:DB2 "select HADR_ROLE, STANDBY_ID, STANDBY_LOG_TIME, STANDBY_REPLAY_LOG_TIME,
varchar(PRIMARY_MEMBER_HOST,20) as PRIMARY_MEMBER_HOST,
varchar(STANDBY_MEMBER_HOST,20) as STANDBY_MEMBER_HOST
from table (mon_get_hadr(NULL))"
- Stop HADR on the standby database:
DB2 STOP HADR ON DATABASE dbname
- Roll forward the standby to the PIT that you want and then
stop:
DB2 ROLLFORWARD DB dbname to time-stamp and STOP
- Use one of the following approaches:
- Restore the lost data on the primary:
Copy the affected data from the standby and send it back to
the primary.
If the errant transaction dropped a table, you
could export it on the standby and import it to the primary. If the
errant transaction deleted rows from a table, you could export the
table on the standby and use an import replace operation on the primary.
- Reinitialize the delayed-replay standby because its log stream
has diverged from the primary's. No action is needed on any other
standbys because they continue to follow the primary and any data
repair on the primary is also replicated to them.
- Restore the database using a backup image taken on the primary.
The image can be one taken at any time.
- Remove all log files in standby log path. This step is important.
The ROLLFORWARD... STOP command
in step 3 made the database log stream diverge from the primary. If
the files are left alone, the newly restored database would follow
that log stream and also diverge from the primary. Alternatively,
you can drop the database before the restore for a clean start, but
then you will also lose the current configuration including HADR configuration.
- Issue the START HADR command with the AS
STANDBY option on the database. The database should then
activate and connect to the primary.
- Have the standby with the intact data become the primary:
- Shut down the old primary to avoid split brain
- On the delayed-replay database, set the hadr_replay_delay configuration
parameter to 0. Reconfigure the other parameters
like hadr_target_list if needed. Then run START
HADR command with the AS PRIMARY BY FORCE options
on the database to convert it to the new primary. Use the BY
FORCE option because there is no guarantee that the configured
principal standby (which could be the old primary) will be able to
connect.
- Redirect clients to the new primary.
- The other standbys will be automatically redirected to the new
primary. However, if a standby received logs from the old primary
beyond the point where old and new primary diverge (the PIT used in
step 3), it will be rejected by the new primary. If this happens,
reinitialize this standby using the same procedure as reinitializing
the old primary.
- Reinitialize the old primary because its log stream has diverged
from the new primary's.
- Restore database using a backup image taken on the new primary,
or taken on the old primary before the PIT used in step 3.
- Remove all log files in the log path. If you do not do this, the
newly restored database will follow the old primary's log stream and
diverge from the new primary. Alternatively, you can drop the database
before the restore for a clean start, but then you also lose the current
configuration including HADR configuration.
- Issue the START HADR command with the AS
STANDBY option on the database. The database should then
activate and connect to the primary.