DB2 10.5 for Linux, UNIX, and Windows

High availability disaster recovery (HADR) monitoring in a DB2 pureScale environment

You must use the db2pd command or the MON_GET_HADR table function to monitor your HADR databases in a DB2® pureScale® environment. Other interfaces, such as the GET SNAPSHOT FOR DATABASE command, the SNAPHADR administrative view, and the SNAP_GET_HADR table function, do not return any HADR information, so it will appear as if HADR is not configured.

The db2pd command and the MON_GET_HADR table function return essentially the same information, but because reads on standby is not supported in a DB2 pureScale environment, you can only use the db2pd command to monitor from a standby. As well, the db2pd command is preferred during takeover because there could be a time window during which neither the primary nor the standby allows client connections.

db2pd command

In a DB2 pureScale environment, the db2pd command returns a section in its output for each log stream being processed by the member on which the command is issued. On the primary, any assisted remote catchup streams are reported by their owner (that is, the assisted member) and the assisting member. On the standby, all of the streams are returned if the command is issued on the replay member; if you issue db2pd on a non-replay member, no data is returned.

If you want to see all of the log streams from the primary, use the -allmembers and -hadr options with the db2pd command. If you use the -allmembers option on the standby, for each non replay member, the output indicates that database is not active on the member; for the replay member, all streams are returned. As a result, this command option is only useful on the standby if you want to find out which member is the current replay member (alternatively, you can check the STANDBY_MEMBER field in the db2pd output from the primary).

The following example is for an HADR setup using three-member clusters: members 0, 1, and 2. Member 1 is active but it is in assisted remote catchup state and is being assisted by member 0; the standby replay member is member 0. The DBA issues the db2pd command for member 0 on the primary. Notice that two sets of data are returned: one for each log stream member 0 is processing:
Database Member 0 -- Database HADRDB -- Active -- Up 0 days 00:23:17 -- Date 06/08/2011 13:57:23

                              HADR_ROLE = PRIMARY
                            REPLAY_TYPE = PHYSICAL
                          HADR_SYNCMODE = ASYNC
                             STANDBY_ID = 1
                          LOG_STREAM_ID = 0
                             HADR_STATE = PEER
                             HADR_FLAGS =
                    PRIMARY_MEMBER_HOST = hostP.ibm.com
                       PRIMARY_INSTANCE = db2inst
                         PRIMARY_MEMBER = 0
                    STANDBY_MEMBER_HOST = hostS.ibm.com
                       STANDBY_INSTANCE = db2inst
                         STANDBY_MEMBER = 0
                    HADR_CONNECT_STATUS = CONNECTED
               HADR_CONNECT_STATUS_TIME = 06/08/2011 13:38:10.199479 (1307565490)
            HEARTBEAT_INTERVAL(seconds) = 25
                  HADR_TIMEOUT(seconds) = 100
          TIME_SINCE_LAST_RECV(seconds) = 3
               PEER_WAIT_LIMIT(seconds) = 0
             LOG_HADR_WAIT_CUR(seconds) = 0.000
      LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.006298
     LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.516
                    LOG_HADR_WAIT_COUNT = 82
  SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 50772
  SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87616
              PRIMARY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
              STANDBY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
                    HADR_LOG_GAP(bytes) = 0
       STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000009.LOG, 1, 49262315
         STANDBY_RECV_REPLAY_GAP(bytes) = 0
                       PRIMARY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                       STANDBY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
                STANDBY_REPLAY_LOG_TIME = 06/08/2011 13:49:19.000000 (1307566159)
           STANDBY_RECV_BUF_SIZE(pages) = 16
               STANDBY_RECV_BUF_PERCENT = 0
             STANDBY_SPOOL_LIMIT(pages) = 13000 
                  STANDBY_SPOOL_PERCENT = 0
                   PEER_WINDOW(seconds) = 0
               READS_ON_STANDBY_ENABLED = N

                              HADR_ROLE = PRIMARY
                            REPLAY_TYPE = PHYSICAL
                          HADR_SYNCMODE = ASYNC
                             STANDBY_ID = 1
                          LOG_STREAM_ID = 1
                             HADR_STATE = REMOTE_CATCHUP
                             HADR_FLAGS = ASSISTED_REMOTE_CATCHUP ASSISTED_MEMBER_ACTIVE
                    PRIMARY_MEMBER_HOST = hostP.ibm.com
                       PRIMARY_INSTANCE = db2inst
                         PRIMARY_MEMBER = 0
                    STANDBY_MEMBER_HOST = hostS.ibm.com
                       STANDBY_INSTANCE = db2inst
                         STANDBY_MEMBER = 0
                    HADR_CONNECT_STATUS = CONNECTED
               HADR_CONNECT_STATUS_TIME = 06/08/2011 13:35:51.724447 (1307565351)
            HEARTBEAT_INTERVAL(seconds) = 25
                  HADR_TIMEOUT(seconds) = 100
          TIME_SINCE_LAST_RECV(seconds) = 16
               PEER_WAIT_LIMIT(seconds) = 0
             LOG_HADR_WAIT_CUR(seconds) = 0.000
      LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.005631
     LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.837
                    LOG_HADR_WAIT_COUNT = 124
  SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
  SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
              PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 1, 56541576
              STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 1, 56541576
                    HADR_LOG_GAP(bytes) = 0
       STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 1, 56541576
         STANDBY_RECV_REPLAY_GAP(bytes) = 0
                       PRIMARY_LOG_TIME = 06/08/2011 13:49:25.000000 (1307566165)
                       STANDBY_LOG_TIME = 06/08/2011 13:49:25.000000 (1307566165)
                STANDBY_REPLAY_LOG_TIME = 06/08/2011 13:49:25.000000 (1307566165)
           STANDBY_RECV_BUF_SIZE(pages) = 16
               STANDBY_RECV_BUF_PERCENT = 0
             STANDBY_SPOOL_LIMIT(pages) = 13000
                  STANDBY_SPOOL_PERCENT = 0
                   PEER_WINDOW(seconds) = 0
               READS_ON_STANDBY_ENABLED = N

MON_GET_HADR table function

In a DB2 pureScale environment, the MON_GET_HADR table function returns a row for each log stream. The table function cannot be issued on the standby because reads on standby is not supported in a DB2 pureScale environment. Use the LOG_STREAM_ID field in the table function output to identify the log stream and the PRIMARY_MEMBER and STANDBY_MEMBER fields to identify the members processing the stream on the primary and standby sides.

The table function takes a member argument and returns the stream that the specified member owns and all remote catchup streams that it is assisting. If the argument is an assisting member, the assisted remote catchup streams have their HADR_STATE field reported as being in REMOTE_CATCHUP with the ASSISTED_REMOTE_CATCHUP flag set in the HADR_FLAGS field. If the argument is an assisted member, the assisted remote catchup stream has its HADR_STATE field reported as DISCONNECTED.

If you specify -1 or NULL as the argument, the results for the current database member (that is, the member processing the query) are returned. If you specify -2 as the argument, the results for all members on the primary are returned. Any assisted remote catchup streams are reported on the assisting member only. If a member is inactive and assisted remote catchup has not yet been established for that member's log stream, that log stream does not appear in the output. The table function request is passed to all active members and the results are merged, so inactive members are not represented.

In the following examples, the DBA calls the MON_GET_HADR table function for monitoring an HADR setup using three-member clusters: members 0, 1, and 2. Member 1 is active but it is in assisted remote catchup state and is being assisted by member 0; the standby replay member is member 0. The DBA calls the table function with argument 0, 1, 2 and -2 (for all members). Notice that two rows are returned when the argument is 0: one for each log stream that member 0 is processing:

Example for member 0
select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE, HADR_FLAGS
 from table (mon_get_hadr(0))
LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE     HADR_FLAGS
------------- -------------- -------------- -------------- -----------------------
0             0              0              PEER                    
1             0              0              REMOTE_CATCHUP ASSISTED_REMOTE_CATCHUP     
Example for member 1
select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE, HADR_FLAGS
 from table (mon_get_hadr(1))
LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE     HADR_FLAGS
------------- -------------- -------------- -------------- ----------------------
1             1              0              DISCONNECTED     
Example for member 2
select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE, HADR_FLAGS
 from table (mon_get_hadr(2))
LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE     HADR_FLAGS
------------- -------------- -------------- -------------- -----------------------
2             2              0              PEER                    
Example for all members
select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE, HADR_FLAGS
 from table (mon_get_hadr(-2))
LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE     HADR_FLAGS
------------- -------------- -------------- -------------- -----------------------
0             0              0              PEER                    
1             0              0              REMOTE_CATCHUP ASSISTED_REMOTE_CATCHUP     
2             2              0              PEER