DB2 Version 10.1 for Linux, UNIX, and Windows

Database configuration for high availability disaster recovery (HADR)

You can use database configuration parameters to help achieve optimal performance with DB2® HADR.

In most cases, you should use the same database configuration parameter settings and database manager configuration parameter settings on the systems where the primary and standby databases are located. If the settings for the configuration parameters on the standby database are different from the settings on the primary, the following problems might occur:

Error messages might be returned for the standby database while the log files that were shipped from the primary database are being replayed.
After a takeover operation, the new primary database might be unable to handle the workload, resulting in performance problems or in applications receiving error messages that they did not receive when they were connected to the original primary database.

Changes to the configuration parameters on the primary database are not automatically propagated to the standby database. You must manually make changes on the standby database. For dynamic configuration parameters, changes take effect without the need to shut down and restart the database management system (DBMS) or the database. For non-dynamic configuration parameters, changes take effect after the standby database is restarted.

Following are sections on specific configuration topics for HADR:

Size of log files configuration parameter on the standby database

One exception to the configuration parameter behavior that is described in the previous paragraph is the behavior of the logfilsiz database configuration parameter. Although the value of this parameter is not replicated to the standby database, to guarantee identical log files on both databases, the setting for the logfilsiz configuration parameter on the standby is ignored. Instead, the database creates local log files whose sizes match the size of the log files on the primary database.

After a takeover, the original standby (new primary) uses the logfilsiz parameter value that you set on the original primary until you restart the database. At that point, the new primary reverts to using the value that you set locally. In addition, the current log file is truncated and any pre-created log files are resized on the new primary.

If the databases keep switching roles as a result of a non-forced takeover and neither database is deactivated, the log file size that is used is always the one from the original primary database. However, if there is a deactivation and then a restart on the original standby (new primary), the new primary uses the log file size that you configured locally. This log file size continues to be used if the original primary takes over again. Only after a deactivation and restart on the original primary would the log file size revert to the settings on the original primary.

Database configuration parameter autorestart

The recommended configuration for the autorestart parameter on HADR systems is ON. If the autorestart parameter is set to OFF, and the server fails, your response depends on whether or not you want to restart or fail over to the standby:

If you want to restart, run the RESTART DATABASE command manually. If the restart fails, perform failover.
If you want to fail over, perform the following steps:
1. Shut down the old primary to prevent a "split brain". Do this by either stopping the DB2 instance or powering off the host machine. If the server is not accessible for administration, fence it off from clients by disabling the client/server network.
  Note: Deactivating the database is not sufficient because client connections can bring it back online. If it failed in a consistent state, then even if the autorestart parameter is set to OFF, this does not prevent client connections from bringing it back online.
2. After shutting down old primary, issue the TAKEOVER HADR command with the BY FORCE option on the standby.

Log receive buffer size on a standby database

By default, the log receive buffer size on a standby database is two times the value that you specify for the logbufsz configuration parameter on the primary database. This size might not be sufficient. For example, consider what might happen when the HADR synchronization mode is set to ASYNC and the primary and standby databases are in peer state. If the primary database is also experiencing a high transaction load, the log receive buffer on the standby database might fill to capacity, and the log shipping operation from the primary database might stall. To manage these temporary peaks, you can make either of the following configuration changes:

Increase the size of the log receive buffer on the standby database by modifying the value of the DB2_HADR_BUF_SIZE registry variable.
Enable log spooling on a standby database by setting the hadr_spool_limit configuration parameter.

Load operations and HADR

If you issue the LOAD command on the primary database with the COPY YES parameter, the command executes on the primary database, and the data is replicated to the standby database if the load copy can be accessed through the path or device that is specified by the command. If load copy data cannot be accessed from the standby database, the table space in which the table is stored is marked invalid on the standby database. Any future log records that pertain to this table space are skipped. To ensure that the load operation can access the load copy on the standby database, use a shared location for the output file from the COPY YES parameter. Alternatively, you can deactivate the standby database while performing the load on the primary, place a copy of the output file in the standby path, and then activate the standby database.

If you issue the LOAD command with the NONRECOVERABLE parameter on the primary database, the command executes on the primary database, and the table on the standby database is marked invalid. Any future log records that pertain to this table are skipped. You can issue the LOAD command with the COPY YES and REPLACE parameters to bring the table back, or you can drop the table to recover the space.

Note: You cannot bring a table back using the LOAD command with the COPY YES and REPLACE options if the table has one of the following characteristics:

The table was created with the NOT LOGGED INITIALLY attribute.
The table is a multidimensional clustered (MDC) table.
The table has compression dictionaries.
The table has XML columns.

Because a load operation with the COPY NO parameter is not supported with HADR, the operation is automatically converted to a load operation with the NONRECOVERABLE parameter. To enable a load operation with the COPY NO parameter to be converted to a load operation with the COPY YES parameter, set the DB2_LOAD_COPY_NO_OVERRIDE registry variable on the primary database. This registry variable is ignored on the standby database. Ensure that the device or directory that you specify for the primary database can be accessed by the standby database by using the same path, device, or load library.

If you are using the Tivoli® Storage Manager (TSM) software to perform a load operation with the COPY YES parameter, you might have to set the vendoropt configuration parameter on the primary and standby databases. Depending on how you configured TSM, the values on the primary and standby databases might not be the same. Also, when using TSM to perform a load operation with the COPY YES parameter, you must issue the db2adutl command with the GRANT parameter to give the standby database read access to the files that are loaded.

If table data is replicated by a load operation with the COPY YES parameter, the indexes are replicated as follows:

If you specify the REBUILD indexing mode option with the LOAD command and the LOG INDEX BUILD table attribute is set to ON (using the ALTER TABLE statement), or if it is set to NULL and the logindexbuild database configuration parameter is set to ON, the primary database includes the rebuilt index object (that is, all of the indexes defined on the table) in the copy file to enable the standby database to replicate the index object. If the index object on the standby database is marked invalid before the load operation, it becomes usable again after the load operation as a result of the index rebuild.
If you specify the INCREMENTAL indexing mode option with the LOAD command and the LOG INDEX BUILD table attribute is set to ON (using the ALTER TABLE statement), or if it is set to NULL and the logindexbuild database configuration parameter on the primary database is set to ON, the index object on the standby database is updated only if it is not marked invalid before the load operation. Otherwise, the index is marked invalid on the standby database.

DB2_HADR_PEER_WAIT_LIMIT registry variable

Restriction: In multiple standby mode, none of this section applies to the auxiliary standbys because they are in SUPERASYNC synchronization mode, so they do not ever enter peer state.

If you set the DB2_HADR_PEER_WAIT_LIMIT registry variable, the HADR primary database breaks out of peer state if logging on the primary database has been blocked for the specified number of seconds because of log replication to the standby. When this limit is reached, the primary database breaks the connection to the standby database. If you disable the peer window by setting the hadr_peer_window configuration parameter to 0, the primary enters the disconnected state, and logging resumes. If you enable the peer window, the primary database enters disconnected peer state, in which logging continues to be blocked. The primary leaves disconnected peer state upon reconnection or peer window expiration. Logging resumes after the primary leaves disconnected peer state.

Note: If you set DB2_HADR_PEER_WAIT_LIMIT, use a minimum value of 10 to avoid triggering false alarms.

Honoring peer window transition when a database breaks out of peer state ensures peer window semantics for safe takeover in all cases. If the primary fails during the transition, normal peer window protection still applies: safe takeover from the standby if it is still in disconnected peer state.

On the standby side, after disconnection, the database continues replaying already received logs. After the received logs have been replayed, the standby reconnects to the primary. After replaying the received logs, the standby reconnects to the primary. Upon reconnection, normal state transition follows: first remote catchup state, then peer state.

Relationship to hadr_timeout database configuration parameter

The hadr_timeout database configuration parameter does not break the primary out of peer state if the primary keeps receiving heartbeat messages from the standby while blocked. The hadr_timeout database configuration parameter specifies a timeout value for the HADR network layer. An HADR database breaks the connection to its partner database if it has not received any message from its partner for the period that is specified by the hadr_timeout configuration parameter. The timeout does not control timeout for higher-layer operations such as log shipping and ack (acknowledgement) signals. If log replay on the standby database is stuck on a large operation such as load or reorganization, the HADR component still sends heartbeat messages to the primary database on the normal schedule. In such a scenario, the primary is blocked as long as the standby replay is blocked unless you set the DB2_HADR_PEER_WAIT_LIMIT registry variable.

The DB2_HADR_PEER_WAIT_LIMIT registry variable unblocks primary logging regardless of connection status. Even if you do not set the DB2_HADR_PEER_WAIT_LIMIT registry variable, the primary always breaks out of peer state when a network error is detected or the connection is closed, possibly as result of the hadr_timeout configuration parameter.

HADR configuration parameters

Some HADR configuration parameters are static, such as hadr_local_host and hadr_remote_host . Static parameters are loaded on database startup, and changes are ignored during run time. HADR parameters are also loaded when the START HADR command completes. On the primary database, HADR can be started and stopped dynamically, with the database remaining online. Thus, one way to refresh the effective value of an HADR configuration parameter without shutting down the database is to stop and restart HADR. In contrast, the STOP HADR brings down the database on the standby, so the standby's parameters cannot be refreshed with database online.

Host name parameters and service and port name parameters (single standby mode)

An HADR pair has five interrelated configuration parameters that you should set:

hadr_local_host
hadr_remote_host
hadr_local_svc
hadr_remote_svc
hadr_remote_inst

TCP connections are used for communication between the primary and standby databases. The "local" parameters specify the local address and the "remote" parameters specify the remote address. A primary database listens on its local address for new connections. A standby database that is not connected to a primary database retries connection to its remote address.

The standby database also listens on its local address. In some scenarios, another HADR database can contact the standby database on this address and send it messages.

Unless the HADR_NO_IP_CHECK registry variable is set, HADR does the following cross-checks of local and remote addresses on connection:

 my local address = your remote address

and

my remote address = your local address

The check is done using the IP address and port number, rather than the literal string in the configuration parameters. You need to set the HADR_NO_IP_CHECK registry variable in NAT (Network Address Translation) environment to bypass the check.

You can configure an HADR database to use either IPv4 or IPv6 to locate its partner database. If the host server does not support IPv6, you must use IPv4. If the server supports IPv6, whether the database uses IPv4 or IPv6 depends upon the format of the address that you specify for the hadr_local_host and hadr_remote_host configuration parameters. The database attempts to resolve the two parameters to the same IP format and use IPv6 when possible. The following table shows how the IP mode is determined for IPv6-enabled servers:

IP mode used for hadr_local_host parameter	IP mode used for hadr_remote_host parameter	IP mode used for HADR communications
IPv4 address	IPv4 address	IPv4
IPv4 address	IPv6 address	Error
IPv4 address	host name, maps to IPv4 only	IPv4
IPv4 address	host name, maps to IPv6 only	Error
IPv4 address	host name, maps to IPv4 and v6	IPv4
IPv6 address	IPv4 address	Error
IPv6 address	IPv6 address	IPv6
IPv6 address	host name, maps to IPv4 only	Error
IPv6 address	host name, maps to IPv6 only	IPv6
IPv6 address	host name, maps to IPv4 and IPv6	IPv6
hostname, maps to IPv4 only	IPv4 address	IPv4
hostname, maps to IPv4 only	IPv6 address	Error
hostname, maps to IPv4 only	hostname, maps to IPv4 only	IPv4
hostname, maps to IPv4 only	hostname, maps to IPv6 only	Error
hostname, maps to IPv4 only	hostname, maps to IPv4 and IPv6	IPv4
hostname, maps to IPv6 only	IPv4 address	Error
hostname, maps to IPv6 only	IPv6 address	IPv6
hostname, maps to IPv6 only	hostname, maps to IPv4 only	Error
hostname, maps to IPv6 only	hostname, maps to IPv6 only	IPv6
hostname, maps to IPv6 only	hostname, maps to IPv4 and IPv6	IPv6
hostname, maps to IPv4 and IPv6	IPv4 address	IPv4
hostname, maps to IPv4 and IPv6	IPv6 address	IPv6
hostname, maps to IPv4 and IPv6	hostname, maps to IPv4 only	IPv4
hostname, maps to IPv4 and IPv6	hostname, maps to IPv6 only	IPv6
hostname, maps to IPv4 and IPv6	hostname, maps to IPv4 and IPv6	IPv6

The primary and standby databases can make HADR connections only if they use the same IPv4 or IPv6 format. If one server is IPv6 enabled (but also supports IPv4) and the other server supports IPv4 only, at least one of the hhadr_local_host and hadr_remote_host parameters on the IPv6 server must specify an IPv4 address to force database on this server to use IPv4.

You can set the HADR local service and remote service parameters (hadr_local_svc and hadr_remote_svc) to either a port number or a service name. The values that you specify must map to ports that are not being used by any other service, including other DB2 components or other HADR databases. In particular, you cannot set either parameter value to the TCP/IP port that is used by the server to await communications from remote clients (the value of the svcename database manager configuration parameter) or the next port (the value of the svcename parameter + 1).

If the primary and standby databases are on different servers, they can use the same port number or service name; otherwise, they must have different values.

Host name, service or port name, and target list parameters (multiple standby mode)

In multiple standby mode, you should still set the hadr_local_host, hadr_local_svc, hadr_remote_host, hadr_remote_host, and hadr_remote_inst configuration parameters. If you set those parameters incorrectly, they are automatically updated after the primary connects to the standbys by using the settings of the hadr_target_list configuration parameter. This parameter specifies the host and port names of all the standbys. The first standby that you specify in the target list is considered to be the principal HADR standby database.

In multiple standby mode, you should still set the hadr_local_host, hadr_local_svc, hadr_remote_host, hadr_remote_host, and hadr_remote_inst configuration parameters. The hadr_local_host and hadr_local_svc parameters have the same meaning as in single standby mode. On the primary, sethadr_remote_host, hadr_remote_host, and hadr_remote_inst to indicate its principal standby. A new parameter, hadr_target_list is used to list all standbys, with the first entry being the principal standby. On standby, set the "remote" parameters to indicate the primary. In certain conditions, the "remote" parameters (on both the primary and the standby) can be automatically updated. For more information, see the "Automatic reconfiguration of HADR parameters" section in Database configuration for multiple HADR standby databases.

Synchronization mode

In single standby mode, the synchronization mode, which you specify with the hadr_syncmode configuration parameter must be identical on the primary and standby databases. The consistency of the value of this configuration parameter is checked when an HADR pair establishes a connection.

In multiple standby mode, the synchronization mode does not have to be the same. All standbys have an effective synchronization mode that is determined by the type of standby that they are. The principal standby uses the synchronization mode of the primary, and the auxiliary standbys use SUPERASYNC. All standbys have a configured synchronization mode, which is the explicit setting for hadr_syncmode and is used if a standby becomes the new primary.

For more detailed information, see "DB2 high availability disaster recovery (HADR) synchronization mode".

HADR timeout and peer window

The timeout period, which you specify with the hadr_timeout configuration parameter, must be identical on the primary and standby databases. The consistency of the values of these configuration parameters is checked when an HADR pair establishes a connection.

With one exception, when the primary database starts, it waits for the longer of the two following periods for a standby to connect:

For a minimum of 30 seconds
For the number of seconds that is specified by the hadr_timeout database configuration parameter.

If the standby does not connect in the specified time, the startup fails. The one exception to this behavior is when you issue the START HADR command with the BY FORCE parameter. In this case, the primary database starts without waiting for the standby database to connect to it.

In multiple standby mode, the primary only waits for the principal standby to connect; a connection to an auxiliary standby is optional.

After an HADR pair establishes a connection, they exchange heartbeat messages. The heartbeat interval is computed from factors like the hadr_timeout and hadr_peer_window configuration parameters. It is reported by the HEARTBEAT_INTERVAL field in MON_GET_HADR table function. If one database does not receive any message from the other database within the number of seconds that is specified by the hadr_timeout configuration parameter, it initiates a disconnect. This behavior means that at most, it takes the number of seconds that is specified by the hadr_timeout configuration parameter for an HADR database to detect the failure of either its partner database or the intervening network. If you set the hadr_timeout configuration parameter too low, you will receive false alarms and frequent disconnections.

If you have the hadr_peer_window configuration parameter set to a nonzero value and the primary loses connection to the standby in peer state, the primary database does not commit transactions until the connection with the standby database is restored or the time value of the hadr_peer_window configuration parameter elapses, whichever happens first.

For maximal availability, the default value for the hadr_peer_window database configuration parameter is 0. When this parameter is set to 0, as soon as the connection between the primary and the standby is closed, the primary drops out of peer state to avoid blocking transactions. The connection can close because the standby closed the connection, a network error is detected, or timeout is reached. For increased data consistency, but reduced availability, you can set the hadr_peer_window database configuration parameter to a nonzero value.

For more information, see "Setting the hadr_timeout and hadr_peer_window database configuration parameters".

The following sample configuration is for the primary and standby databases:

Primary database:

   HADR_LOCAL_HOST           host1.ibm.com
   HADR_LOCAL_SVC            hadr_service
   HADR_REMOTE_HOST          host2.ibm.com
   HADR_REMOTE_SVC           hadr_service
   HADR_REMOTE_INST          dbinst2
   HADR_TIMEOUT              120
   HADR_SYNCMODE             NEARSYNC
   HADR_PEER_WINDOW          120

Standby database:

   HADR_LOCAL_HOST           host2.ibm.com
   HADR_LOCAL_SVC            hadr_service
   HADR_REMOTE_HOST          host1.ibm.com
   HADR_REMOTE_SVC           hadr_service
   HADR_REMOTE_INST          dbinst1
   HADR_TIMEOUT              120
   HADR_SYNCMODE             NEARSYNC
   HADR_PEER_WINDOW          120