This scenario describes the planning, configuring, and deploying of a high availability disaster recovery (HADR) setup for an online travel service called ExampleFlightsExpress (EFE), which is currently using the DB2® pureScale® Feature. All these steps can be done without any downtime.
db2 BACKUP DB hadr_db TO backup_dir
db2 RESTORE DB hadr_db FROM backup_dir
db2 "UPDATE DB CFG FOR hadr_db USING
HADR_TARGET_LIST {s0:4000|s1:4000|s2:4000|s3:4000}
HADR_REMOTE_HOST {s0:4000|s1:4000|s2:4000|s3:4000}
HADR_REMOTE_INST db2inst
HADR_SYNCMODE async"
Because there is only one
standby, the hadr_remote_host parameter specifies
the same group of addresses as the hadr_target_list parameter.db2 "UPDATE DB CFG FOR hadr_db MEMBER 0 USING
HADR_LOCAL_HOST p0
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 1 USING
HADR_LOCAL_HOST p1
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 2 USING
HADR_LOCAL_HOST p2
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 3 USING
HADR_LOCAL_HOST p3
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db USING
HADR_TARGET_LIST {p0:4000|p1:4000|p2:4000|p3:4000}
HADR_REMOTE_HOST {p0:4000|p1:4000|p2:4000|p3:4000}
HADR_REMOTE_INST db2inst
HADR_SYNCMODE async"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 0 USING
HADR_LOCAL_HOST s0
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 1 USING
HADR_LOCAL_HOST s1
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 2 USING
HADR_LOCAL_HOST s2
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 3 USING
HADR_LOCAL_HOST s3
HADR_LOCAL_SVC 4000"
db2 START HADR ON DB hadr_db AS STANDBY
db2 START HADR ON DB hadr_db AS PRIMARY
select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE
from table (mon_get_hadr(-2))
LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE
------------- -------------- -------------- -----------------------
0 0 0 PEER
1 1 0 PEER
2 2 0 PEER
3 3 0 PEER
The
DBA confirms that standby member 0, the preferred replay member, is
indeed the current replay member by looking at the STANDBY_MEMBER field.
Every log stream reports the same member on the standby because all
the members on the primary are connected to that standby member. The DBA has to perform a role switch; that is, the current standby will take over the primary role, and the current primary will take over the standby role. This will allow some maintenance which requires a shutdown of the cluster to be performed at site A. This procedure takes place during a time of low usage in order to minimize impact on applications currently connected to the database.
SELECT ID,
varchar(STATE,21) AS STATE,
varchar(HOME_HOST,10) AS HOME_HOST,
varchar(CURRENT_HOST,10) AS CUR_HOST,
ALERT
FROM SYSIBMADM.DB2_MEMBER
ID STATE HOME_HOST CUR_HOST ALERT
------ --------------------- ---------- ---------- --------
0 STARTED p0 p0 NO
1 STARTED p1 p1 NO
2 STARTED p2 p2 NO
3 STARTED p3 p3 NO
4 record(s) selected.
select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE
from table (mon_get_hadr(-2))
LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE
------------- -------------- -------------- -----------------------
0 0 0 PEER
1 1 0 PEER
2 2 0 PEER
3 3 0 PEER
TAKEOVER HADR ON DB hadr_db
After
the command completes, member 0 on the new standby (the preferred
replay member) is chosen as the replay member and the database is
shut down on the other members on the standby cluster. On the new
primary, the database is only activated on member 0; other members
are activated with a client connection or if the DBA explicitly issues
the ACTIVATE DATABASE command on each of them.
Automatic client reroute sends any new clients to site B.DEACTIVATE DATABASE hadr_db
db2stop
db2start
ACTIVATE DATABASE hadr_db
The
database is activated as an HADR primary with one replay member.TAKEOVER HADR ON DB hadr_db
The DBA has to perform a failover; that is, an unexpected outage at site A requires that the standby at site B take over the primary role. An important difference for HADR in a DB2 pureScale environment is that there is no support for using IBM® Tivoli® System Automation for Multiplatforms (SA MP) to automate the failover (it's already being used to ensure high availability in the DB2 pureScale cluster). At any rate, in this scenario the DBA wants to have manual control over this kind of response to an outage.
TAKEOVER HADR ON DB hadr_db BY FORCE
The
standby sends a disabling message to shut down the primary. After
stopping log shipping and retrieval, the standby completes the replay
of any logs in its log path. Finally, the standby becomes the new
primary.db2pd -hadr -db hadr_db
START HADR ON DB hadr_db AS STANDBY
db2pd -hadr -db hadr_db
Unfortunately,
the log streams of the databases at the two sites have diverged, so
the database is showing as disconnected. The DBA looks at the diag.log file
of one of the members on the old primary and sees a message indicating
that the database on site A cannot be made consistent with the new
primary database. DROP DATABASE DB hadr_db
BACKUP DATABASE DB hadr_db ONLINE TO backup_dir
db2 RESTORE DB hadr_db FROM backup_dir
db2 "UPDATE DB CFG FOR hadr_db USING
HADR_TARGET_LIST {s0:4000|s1:4000|s2:4000|s3:4000}
HADR_REMOTE_HOST {s0:4000|s1:4000|s2:4000|s3:4000}
HADR_REMOTE_INST db2inst
HADR_SYNCMODE async"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 0 USING
HADR_LOCAL_HOST p0
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 1 USING
HADR_LOCAL_HOST p1
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 2 USING
HADR_LOCAL_HOST p2
HADR_LOCAL_SVC 4000"
db2 "UPDATE DB CFG FOR hadr_db MEMBER 3 USING
HADR_LOCAL_HOST p3
HADR_LOCAL_SVC 4000"
db2 START HADR ON DB hadr_db AS STANDBY
db2pd -hadr -db hadr_db