IBM Support

Monitoring and identifying tablespaces in invalid or error state, or tables in Inoperative state on the HADR Standby database

Question & Answer


Question

How can I monitor and identify any tablespaces in invalid or error state, or tables in Inoperative state on my HADR Standby database?

Answer



Identifying Tablespaces in invalid or error state on Standby database

In an HADR environment, when a Standby database has a tablespace in an invalid or error state, the replay of transactions on this tablespace will stop, while the replay of transactions on other valid tablespaces will continue. The Primary database will not be affected, and the condition of this tablespace on the Standby may go unnoticed.
If the DBA is not aware of this tablespace condition on the Standby database, then sometime later when a TAKEOVER operation is performed on the Standby database, applications may be impacted by the unavailability of this tablespace.
To avoid this unexpected situation, it is highly recommended to monitor, identify and rectify it as soon as possible.

Starting in v11.1.1.1 and v10.5 Fix Pack 9, when one or more tablespaces is in an invalid or error state on the Standby database, the HADR_FLAGS field will display the value 'STANDBY_TABLESPACE_ERROR'.



The HADR_FLAGS field can be monitored by using the "db2pd -hadr" command on the Primary or Standby database, or by using the MON_GET_HADR() table function on the Primary database or the Standby database when the Reads-on-Standby feature is enabled.

For example:

    $db2pd -db HADRDB1 -hadr

                  HADR_ROLE = PRIMARY
                REPLAY_TYPE = PHYSICAL
              HADR_SYNCMODE = SYNC
                 STANDBY_ID = 1
              LOG_STREAM_ID = 0
                 HADR_STATE = PEER
                 HADR_FLAGS = STANDBY_TABLESPACE_ERROR TCP_PROTOCOL
        PRIMARY_MEMBER_HOST = hotellnx119
           PRIMARY_INSTANCE = db2inst1
             PRIMARY_MEMBER = 0
        STANDBY_MEMBER_HOST = hotellnx119
           STANDBY_INSTANCE = db2inst2
             STANDBY_MEMBER = 0
        HADR_CONNECT_STATUS = CONNECTED
                        ...etc...


    $ db2 "SELECT STANDBY_ID, HADR_FLAGS from table(MON_GET_HADR(NULL))"

     STANDBY_ID HADR_FLAGS
     ---------- -------------------------------------
              1 STANDBY_TABLESPACE_ERROR TCP_PROTOCOL


When this condition occurs, the affected tablespace(s) can be identified on the Standby database by using traditional methods, such as by examining the 'State' value of "db2pd -tablespaces" output, or the 'tablespace_state' value of the MON_GET_TABLESPACES() table function when the Reads-on-Standby feature is enabled.

Common erroneous states for tablespaces on an HADR Standby are:
OFFLINE (0x4000), or
RESTORE_PENDING (x100), or
ROLLFORWARD_PENDING (x80).
(For a description of tablespace states, see 'DB2 Basics: Demystifying table and table space states' link below).

For example:
(in this example tablespace ID #7 has a State value of x80 ROLLFORWARED_PENDING )

    [Standby]$ db2pd –tablespaces –alldbs

    Address            Id TotalPgs ...etc...      State ...etc...
    0x07000000704080C0 0     64702 ...etc... 0x00000000 ...etc...
    0x07000000704108E0 3   2289117 ...etc... 0x00000000 ...etc...
    0x0700000070419100 4         9 ...etc... 0x00000000 ...etc...
    0x0700000070421920 5   1099975 ...etc... 0x00000000 ...etc...
    0x070000007042C180 6   1673723 ...etc... 0x00000000 ...etc...
    0x0700000070437260 7  14005309 ...etc...
 0x00000080 ...etc...
    0x0700000070442520 8   2559882 ...etc... 0x00000000 ...etc...



    [Standby]$  db2 "select TBSP_ID, TBSP_STATE from TABLE(MON_GET_TABLESPACE('',-2)) where TBSP_STATE NOT IN ('NORMAL')"

     TBSP_ID  TBSP_STATE
     -------  -------------------
           7  ROLLFORWARD_PENDING



We can find the tables residing within this tablespace by using the MON_GET_TABLE() function either on the Primary database, or on the standby database when the Reads-on-Standby feature is enabled.

     $ db2 "SELECT tb.TABNAME, tb.TABSCHEMA from table(mon_get_table('','',-2)) as tb where tb.TBSP_ID=7"

     TABNAME      TABSCHEMA
    ------------ -------------

     MYTABLE2     MYSCHEMA2  


Additionally, the db2diag.log may contain error messages during the time frame that this situation occurred, similar to these:

    EDUID   : 88                   EDUNAME: db2redom (HADR111)
    FUNCTION: DB2 UDB, buffer pool services, sqlbSetPoolState, probe:198
    MESSAGE : ADM12512W  Log replay on the HADR standby has stopped on table space
              "MYTS2" (ID "7") because it has been put into "RESTORE PENDING"
              state.

    EDUID   : 88                   EDUNAME: db2redom (HADR111)
    FUNCTION: DB2 UDB, data protection services, sqlpMasterDbcb::sqlpAddTbspToAbnormalList, probe:20
    MESSAGE : Added tablespace to abnormal tablespace list.
    DATA #1 : Pool ID, PD_TYPE_SQLB_POOL_ID, 2 bytes
    7



How to recover from tablespace errors on an HADR Standby database

For more information on how to recover from this condition, please see Technote #1993389 "How to recover from tablespace errors on an HADR Standby database".
http://www-01.ibm.com/support/docview.wss?uid=swg21993389



Identifying tables in inoperative state on HADR Standby database:

It is possible for a table to become inoperative on the Standby database, while the tablespace it resides in is otherwise fully available.
Please refer to technote #1647774 "Checking for unavailable tables on the HADR standby" for more details: http://www-01.ibm.com/support/docview.wss?uid=swg21647774
Also see Develperworks article titled "DB2HADR-Checking for unavailable tables on the HADR standby" for scripts to assist with identifying inoperative tables on the Standby:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Checking%20for%20unavailable%20tables%20on%20the%20HADR%20standby

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"High Availability - HADR","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"10.5;11.1","Edition":"Advanced Enterprise Server;Advanced Workgroup Server;Enterprise Server;Express;Express-C;Personal;Workgroup Server","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21993013